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Abstract  (long  version) 

The  azimuth/elevation  (cross-range)  resolution  of  a  bat  or  dolphin  sonar  is  theoretically  much 
worse  than  its  range  resolution  at  ranges  of  one-half  meter  or  more.  Synthetic  aperture  sonar 
(SAS)  processing  can  be  used  to  improve  cross-range  resolution  by  summing  echoes  from  the  same 
target  point  as  seen  from  different  relative  positions  of  the  sonar  and  target.  Dolphins  and  bats  that 
transmit  broadband,  frequency  modulated  signals  use  Doppler  tolerant  waveforms  and  may  be 
incapable  of  pulse-to-pulse  coherent  processing.  A  relevant  form  of  SAS  involves  Doppler 
tolerant,  noncoherent  delay-and-sum  processing,  and  it  is  similar  to  back  projection  tomography. 

Another  technique  for  improving  cross-range  resolution  is  to  scan  the  environment  with 
multiple,  overlapping  beams  from  a  single  sensor  position,  and  to  solve  the  resulting  set  of 
simultaneous  equations  for  the  reflectivity  of  each  point  (beam  pattern  deconvolution).  Both  beam 
scanning  and  translational  motion  are  observed  in  animals.  Beam  deconvolution  and  SAS  can  be 
combined  in  a  sonar  processor  that  forms  an  internal  model  of  the  environment  and  that  updates 
this  model  by  comparing  it  with  echo  data.  This  updating  method  is  known  as  the  algebraic 
reconstruction  technique  (ART)  in  medical  imaging  applications,  and  it  is  equivalent  to  gradient 
descent  for  solving  deconvolution  (sharpening)  problems.  Simple,  biologically  feasible  versions  of 
back  projection  SAS  and  ART-SAS  can  form  images  of  reflectivity  and  other  target  features  as  a 
function  of  range,  azimuth,  and  elevation.  The  resulting  reflectivity  and  feature  images  are  used  to 
explain  how  dolphins  find  buried  fish. 

ART-SAS  can  be  implemented  with  a  gradient  descent  optimization  process  that  uses  top- 
down,  bottom-up  processing.  This  processor  obtains  high  resolution  target  representations  from 
low  resolution,  nonlinear  signal  representations,  and  it  can  be  used  to  generalize  spectrogram 
correlation  models  to  explain  bats’  apparent  pulse  compression  capability  (matched-filter 
equivalent  processing).  Gradient  descent  parameter  estimation  can  be  combined  with  a  target 
classifier  that  uses  associative  memory.  The  resulting  associative  gradient  descent  process  has  fast 
convergence  and  avoids  spurious  local  minima.  Gradient  descent  top-down,  bottom-up  processing 
compares  low-resolution  versions  of  sensor  images  (as  in  the  superior  colliculus)  with  low- 
resolution  (deliberately  degraded)  versions  of  a  high-resolution  image  model.  The  high-resolution 
model  is  updated  via  the  error  generated  by  the  low-resolution  comparison.  Although  conventional 
image  sharpening  operations  (Laplacian,  2-D  high-pass,  lateral  inhibition)  operators  are  not 
precluded  by  this  process,  they  seem  to  be  unnecessary. 

A  software  implementation  of  the  biologically  inspired  acoustic  imaging  system  is  used  to  form 
conventional  images  and  feature  images  of  mines  that  are  ensonified  with  a  dolphin-like  pulse. 

Some  of  these  images  are  formed  with  very  sparse  aspect  sampling,  corresponding  to  area  coverage 
rates  that  are  at  least  an  order  of  magnitude  larger  than  the  area  coverage  rates  of  conventional 
SAS  systems.  This  improvement  is  associated  with  bionic,  nonlinear  suppression  of  artifacts 
associated  with  SAS  point  spread  function  sidelobes,  and  with  tolerance  to  aspect  dependent  phase 
changes  induced  by  physical  scattering  mechanisms. 

Motion  compensation  and  adaptive  focusing  are  obtained  with  an  image-based  tracker,  which  is 
a  requirement  for  a  biological  system  that  pursues  and  images  prey  simultaneously.  Image-based 
tracking  accomplishes  motion  compensation  and  adaptive  focusing  by  utilizing  sequentially  formed 
test  images  together  with  an  image  evaluation  criterion.  The  tracker  can  incorporate  both 
translational  and  rotational  motion. 
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1.  Introduction 


Animal  sonar  systems  typically  are  characterized  by  large  bandwidths,  motion  of  the 
transmitter/receiver,  and  small  aperture  (array  size)  relative  to  man-made  sonars.  Such  systems 
have  cross-range  (azimuth  and/or  elevation)  resolution  that  is  much  worse  than  their  range 
resolution.  For  bats  and  dolphins,  the  theoretical  disparity  between  range  and  cross-range 
resolution  becomes  large  for  ranges  in  excess  of  half  a  meter.  This  disparity  can  be  mitigated  by 
using  synthetic  aperture  sonar  (SAS)  processing.  SAS  forms  an  image  in  which  cross-range 
resolution  is  commensurate  with  range  resolution.  With  the  possible  exception  of  single-pulse 
Doppler  processing,  however,  conventional  SAS  appears  to  be  much  too  complicated  for 
implementation  by  biological  systems. 

Recent  results  indicate  that  synthetic  aperture  processing  can  be  greatly  simplified.  By 
considering  simplifications  that  allow  for  biological  implementation  and  generalizations  that  can 
emulate  animal  echolocation  capabilities,  SAS  processing  has  actually  been  advanced  beyond  the 
previous  state-of-the-art  for  man-made  systems.  These  advances  involve  high  resolution  feature 
images,  image-based  tracking  for  motion  compensation,  sparse  angular  sampling,  and  the 
utilization  of  all  available  knowledge  for  acoustic  imaging.  All  available  knowledge  includes  prior 
expectations,  non-acoustic  sensory  information,  and  acoustic  information  that  is  not  explicitly 
associated  with  imaging,  such  as  resonances. 

Experimentally  derived  images  that  are  shown  in  this  report  are  from  targets  that  were 
suspended  in  lake  water.  The  targets  were  completely  contained  within  the  transmitter/receiver 
sonar  beam  width  and  echoes  were  obtained  as  the  targets  were  rotated.  The  transmitted  signal 
was  a  dolphin-like  pulse  with  10  dB  bandwidth  between  50  and  150  kHz.  The  echoes  were 
recorded  by  ARL,  Univ.  of  Texas  at  Austin  and  were  put  into  PC  format  and  furnished  to  Chirp 
Corp.  by  P.  Moore,  D.  Helweg,  and  J.  Sigurdson,  Code  D351,  SPAWAR  Systems  Center,  San 
Diego. 


2.  Doppler-based  SAS 

Some  synthetic  aperture  systems  depend  upon  Doppler  sensitivity  while  others  are  Doppler 
tolerant.  A  Doppler-based  system  utilizes  the  angle  dependence  of  range-rate  for  a  moving  sonar 
and  a  stationary  scattering  point.  A  relatively  large  range-rate  is  observed  along  the  path  of  the 
sonar  platform  motion,  and  zero  range-rate  is  observed  orthogonal  to  the  path  of  motion.  A 
mapping  thus  exists  between  azimuth  angle  and  range-rate.  For  side-looking  systems,  an  object 
appears  to  rotate  relative  to  the  sonar,  and  this  rotational  motion  can  be  used  to  form  an  image  of 
the  object. 

Range-rate  can  be  measured  with  a  single  pulse  if  the  pulse  has  sufficient  time-bandwidth 
product  to  estimate  relevant  Doppler-induced  time  compressions  of  the  signal  [1,2],  If  a  single 
pulse  has  insufficient  time-bandwidth  product  for  Doppler-based  angle  measurements,  then  a  fully 
coherent  system  can  use  multiple  pulses  for  range  rate  estimation.  In  the  multi-pulse  case,  many 
echoes  are  stored  and  processed  as  though  they  are  all  obtained  from  a  long-duration  transmitted 
signal  that  is  composed  of  many  transmitted  pulses.  Even  if  a  new  pulse  is  transmitted  only  after 
the  echo  from  the  previous  pulse  is  received,  the  new  pulse  is  part  of  the  composite  signal.  This 
kind  of  “pulse-Doppler”  processing  requires  a  coherent  integration  time  that  extends  over  multiple 
transmissions  and  receptions.  Different  scattering  points  correspond  to  different  range-rate  vs  time 
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histories,  and  a  coherent  pulse-Doppler  processor  is  used  to  form  a  matched  filter  for  each  image 
point  [3-5]. 

3.  Doppler  tolerant,  tomographic  SAS 

It  is  uncertain  whether  echolocating  animals  are  capable  of  multi-pulse,  coherent  processing, 
although  sensitivity  to  a  phase  shift  of  a  single  echo  has  been  demonstrated  in  bats  [6-8],  A 
conservative  model  assumes  that  multi-pulse  coherent  processing  capability  is  lacking.  Even  if 
bats  or  dolphins  have  the  ability  to  coherently  sum  echoes  from  different  transmissions,  a 
noncoherent  SAS  processor  or  semi-coherent  SAS  (summation  of  envelope  detected  matched  filter 
outputs)  is  advantageous  because: 

1 .  The  SAS  processor  is  much  easier  to  implement  and  is  more  tolerant  of  small  errors  between 
predicted  and  actual  ranges  and  range-rates, 

2.  Little  resolution  is  lost  when  wideband,  biological  signals  are  used,  since  the  envelope  of  the 
signal  auto-correlation  function  contains  only  a  few  of  the  fine-structure  oscillations  that  are 
used  for  coherent  processing; 

3.  The  aspect  sampling  constraints  that  are  associated  with  spatial  aliasing  (the  effect  of  synthetic 
array  element  locations  that  are  separated  by  more  than  half  a  wavelength)  are  removed,  and 
are  replaced  by  graceful  degradation  for  sparse  angular  sampling  (degradation  that  can  be 
predicted  from  the  peak-to-sidelobe  ratio  of  the  SAS  point  spread  function,  which  is  the  same 
as  the  range,  cross-range  ambiguity  function  of  the  synthetic  array  processor); 

4.  The  image  is  tolerant  of  aspect-dependent  phase  changes  that  are  introduced  by  viewing  the 
same  target  point  from  different  directions. 

A  wideband,  Doppler  tolerant,  tomographic  synthetic  aperture  processor  can  be  simplified  to 
remove  the  requirement  for  fully  coherent  processing.  In  the  case  of  dolphins,  even  the  single-pulse 
matched  filter  assumption  (or  a  process  that  is  equivalent  to  matched  filtering)  is  not  required. 

This  type  of  SAS  processor  is  described  in  the  following  paragraphs. 

A  moving  sonar  transmits  signals  and  receives  echoes  from  a  sequence  of  points  along  its  path 
of  motion.  The  receiving  points  along  the  sonar’s  path  are  regarded  as  the  locations  of  elements 
that  are  part  of  a  large,  synthetic  array.  This  synthetic  array  can  focus  on  a  particular  point  by 
delay-and-sum  beam  forming.  A  compensatory  delay  is  inserted  at  the  output  of  each  element, 
such  that  all  the  echoes  from  a  given  scattering  point  occur  at  the  same  time.  The  resulting  time- 
registered  echoes  are  then  added.  The  delay-and-sum  process  is  equivalent  to  forming  a  spatial 
matched  filter  for  echoes  from  the  chosen  scattering  point. 

The  delay-and-sum  process  is  also  equivalent  to  reconstructing  an  image  from  its  projections 
[9],  Projections  occur  naturally  in  radar/sonar  data.  All  scattering  points  that  are  within  the 
physical  beam  width  of  the  sonar  and  that  are  at  the  same  range  (i.e.,  that  lie  along  the  same 
constant-range  surface)  contribute  to  the  same  echo  sample.  The  sequence  of  echo  samples  on  an 
A-scan  (matched  filter  response  vs  range)  represents  a  projection  of  the  scatterer  reflectivity 
distribution  along  the  range  axis.  Different  transmitter/receiver  locations  correspond  to  different 
propagation  directions,  and  thus  to  different  projections  of  the  scatterer  reflectivity  distribution  as 
shown  in  Figure  1 . 

Several  methods  can  be  used  to  reconstruct  the  reflectivity  distribution  from  its  projections  [10], 
The  back  projection  algorithm  is  nearly  identical  to  delay-and-sum  beam  forming,  as  demonstrated 
in  Appendix  A  [1 1],  Back  projection  or  delay-and-sum  beam  forming  can  be  implemented 
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sequentially,  such  that  the  reflectivity  estimate  of  each  pixel  is  updated  with  each  new  echo.  The 
echo  sample  that  corresponds  to  a  given  target  point  is  added  to  the  sum  of  previous  samples  (one 
from  each  previous  echo)  that  correspond  to  the  same  point.  At  a  given  sonar  location,  the  echo 
sample  corresponding  to  a  chosen  target  point  also  corresponds  to  all  the  other  target  and  clutter 
points  at  the  same  range.  At  a  new  location,  the  constant-range  surface  is  rotated,  and  the  echo 
sample  for  the  chosen  target  point  corresponds  to  other  points  that  are  located  on  a  different 
constant-range  surface,  as  in  Figure  1. 

Delay-and-sum  synthetic  aperture  processing  does  not  require  Doppler  information.  In  fact, 
processing  would  be  simplified  if  the  sonar  were  to  stop  at  each  synthetic  array  element  location, 
transmit  a  signal,  receive  the  resulting  echoes,  and  then  move  to  the  next  transmit/receive  location. 
To  take  advantage  of  this  simplification  without  stopping,  the  system  can  use  Doppler  tolerant 
signals,  such  that  the  matched  filter  response  is  not  sensitive  to  range-rate. 

Another  simplification  is  to  use  noncoherent  delay-and-sum  beamforming,  such  that  matched 
filter  envelopes  are  used  and  phase  is  discarded.  This  simplification  is  feasible  with  wide  band 
signals,  since  the  envelope  of  the  matched  filter  response  contains  comparatively  few  oscillatory 
“fine  structure”  peaks,  which  correspond  to  phase  information.  The  resulting  processor  is 
semicoherent;  a  matched  filter  is  used  for  each  echo,  but  different  echoes  are  noncoherently 
combined  by  summing  the  envelope  detected  matched  filter  responses. 


Observation  point  m-2 


V 


Observation  point  m=3  <i  \(A) 


r3(A) 


Figure  1 .  As  a  sensor  moves,  it  creates  a  synthetic  array  with  elements  m=  1 ,2,3, . . . .  The  range  of 
point  A  is  different  for  each  of  these  elements.  To  focus  the  synthetic  array  on  point  A,  the 
corresponding  delays  are  compensated  by  a  delay-and-sum  operation.  Each  line  through  point  A 
represents  part  of  a  constant-range  surface  that  lies  within  the  beam  width  of  the  transmitter.  All 
the  scattering  points  that  are  on  a  line  contribute  to  the  m*  echo  at  range  rm(A). 
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Wide-band,  short  duration  pulses  such  as  those  used  by  dolphins  are  Doppler  tolerant  with  or 
without  semicoherent  processing.  Long  duration  wide  band  signals  with  hyperbolic  frequency 
modulation  (linear  period  modulation)  such  as  those  used  (or  approximated)  by  many  FM  bats  are 
Doppler  tolerant  when  they  are  processed  by  a  semicoherent  receiver  [12-14].  In  the  case  of 
dolphins,  a  matched  filter  may  be  approximated  by  the  band-pass  operation  of  the  receiver,  since 
the  signal  has  very  small  time-bandwidth  product.  A  tomographic  SAS  model  for  dolphins  can 
thus  use  noncoherent  processing  without  a  matched  filter  assumption. 

A  tomographic  SAS  model  for  bats  requires  pulse  compression  via  matched  filtering,  inverse 
filtering,  or  an  equivalent  process,  together  with  noncoherent  pulse-to-pulse  summation  capability. 
A  process  that  is  equivalent  to  matched  filtering  or  inverse  filtering  may  be  synthesized  by 
spectrogram  correlation  [15,16]  or  by  a  time-frequency  plane  version  of  the  top-down,  bottom-up 
gradient  descent  process  to  be  discussed  in  the  Section  9. 

4.  The  range,  cross-range  ambiguity  function  (SAS  point  spread  function) 

The  delay-and-sum  receiver  response  to  a  point  scatterer  is  the  sum  of  the  rotated  constant- 
range  curves  in  Figure  1.  This  sum  is  shaped  like  an  asterisk.  For  M  different  sensor  positions, 
the  center  point  of  the  asterisk  is  M  times  larger  than  an  individual  line,  as  shown  in  Figure  2. 

A  sampled  version  of  a  two-dimensional  reflectivity  distribution  is  an  array  of  sample  points 
with  different  reflectivities.  The  back  projection  SAS  image  of  the  sampled  reflectivity  distribution 
is  a  supeiposition  of  weighted,  shifted  versions  of  the  function  in  Figure  2,  where  the  weights 
correspond  to  the  sample  point  reflectivities  and  the  shifts  correspond  to  the  locations  of  the  sample 
points.  This  weighted  sum  is  a  discrete  convolution  operation.  The  image  of  the  reflectivity 
distribution  is  the  convolution  of  the  function  in  Figure  2  with  the  actual  distribution  (or  a  real, 
non-negative  version  of  the  actual  distribution). 


Figure  2.  The  point  spread  function  (SAS  range,  cross-range  ambiguity  function)  of  a 
tomographic  SAS  with  12  degree  angle  increments  over  180  degrees.  Peak-to-sidelobe  ratio  = 
number  of  echoes  used  to  construct  the  image  =  180/12  =  15. 
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The  actual  reflectivity  distribution  is  convolved  with  (or  smeared  by)  the  function  in  Figure  2, 
which  is  known  as  the  point  spread  function  (PSF).  The  function  in  Figure  2  also  represents  the 
response  of  a  receiver  that  makes  a  hypothesis  that  a  point  scatterer  is  present  at  the  center  of  the 
asterisk,  when  the  actual  point  scatterer  position  is  at  various  points  on  the  image  plane  in  the 
Figure.  The  function  in  Figure  2  is  thus  the  range,  cross-range  ambiguity  function  (RCRAF)  of  the 
imaging  system,  as  well  as  the  point  spread  function  [17,18],  This  ambiguity  function  has  a  peak- 
to-sidelobe  ratio  of  M,  where  M  is  the  number  of  different  sensor  positions  or  elements  in  the 
synthetic  array.  When  M  is  small  (as  in  a  monaural  or  binaural  system  with  no  SAS  capability), 
a  strongly  reflecting  point  at  one  location  can  severely  affect  the  image  of  a  weakly  reflecting 
point  at  a  different  location.  PSF  sidelobes  produce  artifacts  when  a  scattering  point  is  much 
larger  than  its  neighbors  for  at  least  one  aspect  angle.  The  artifacts  appear  as  lines  that  pass 
through  the  strong  scattering  point. 

If  the  peak-to-sidelobe  ratio  (P/S)  of  the  range,  cross-range  ambiguity  function  (RCRAF)  is 
small,  different  points  on  a  distributed  target  can  interfere  with  one  another,  leading  to  a  self-clutter 
effect.  A  target  that  is  surrounded  by  other  scatterers  also  will  be  difficult  to  detect.  A 
psychometric  procedure  that  measures  angular  accuracy  or  resolution  between  closely  spaced 
points  may  depend  only  on  the  sharpness  of  the  central  peak  of  the  binaural  RCRAF,  and  can  be 
misleading  with  respect  to  detection  in  clutter  and  classification  of  distributed  targets. 

For  a  binaural  system,  the  RCRAF  has  P/S=2  and  is  scissor-shaped.  As  a  binaural  sonar 
approaches  a  target,  the  angle  between  the  scissor  blades  increases.  If  range  resolution  is 
sufficiently  fine  (if  the  width  of  the  scissor  blades  is  small)  and  if  echo  samples  corresponding  to 
each  point  on  the  target  are  summed  as  the  sonar  approaches  the  target,  then  P/S  becomes  larger 
and  the  image  becomes  less  ambiguous  as  the  target  is  approached.  Binaural  processing  thus  can 
be  used  with  forward-looking  SAS  to  create  an  acoustic  image.  Biomimetic  nonlinear  processing 
can  be  used  to  reduce  the  effects  of  large  point  spread  function  sidelobes. 

5.  Comparison  of  biologically  inspired  SAS  with  conventional  SAS 

Biologically  inspired  SAS  has  at  least  two  properties  that  make  it  different  from  conventional 
Doppler  tolerant  (tomographic)  SAS: 

1.  Biomimetic  nonlinear  suppression  of  the  sidelobes  of  the  point  spread  function  (PSF)  of  SAS 

images; 

2.  Noncoherent  (phase  tolerant)  summation  over  different  aspect  angles,  which  decreases 

sensitivity  of  the  SAS  image  to  aspect-dependent  phase  shifts  that  depend  on  target  shape. 

Different  phase  shifts  are  expected  to  occur  when  various  target  structures  are  viewed  from 
different  aspects.  When  a  flat  metal  plate  is  tilted  so  that  it  does  not  reflect  energy  directly  back 
toward  a  receiver,  its  impulse  response  (as  approximated  by  physical  optics)  changes  from  a  single 
positive  impulse  to  a  positive  impulse  at  the  leading  edge  and  a  negative  impulse  at  the  trailing 
edge.  The  impulse  response  of  a  spherical  scatterer  is  approximated  by  an  impulse  followed  by  a 
rectangular  function.  The  rectangular  function  is  synthesized  via  an  integrator  with  positive  weight 
followed  by  another,  delayed  integrator  with  negative  weight.  In  general,  the  echo  from  a  complex 
target  can  be  represented  as  a  weighted  sum  of  delayed  versions  of  the  transmitted  signal, 
integrated  versions  of  the  signal,  and  differentiated  versions  of  the  signal.  The  weights  and  delays 
in  this  sum  are  aspect  dependent;  the  weights  can  change  sign  (e.g.,  from  positive  to  negative) 
depending  on  aspect.  The  integration  and  differentiation  operations  induce  90  degree  phase  shifts. 
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An  example  of  an  aspect  dependent  sign  change  is  shown  in  Figure  3.  A  tilted  rectangular  plate 
is  composed  of  material  with  higher  acoustic  velocity  than  water  (lower  acoustic  impedance).  The 
physical  optics  approximation  to  the  target  impulse  response  is  a  positive  impulse  from  the  near 
edge  of  the  plate  followed  by  a  negative  impulse  from  the  far  edge.  The  impulse  response  of  an 
edge  that  is  initially  closest  to  the  sonar  changes  from  positive  to  negative  as  the  plate  rotates. 

Conventional  SAR/SAS  systems  implicitly  assume  that  targets  are  composed  of  independent 
point  scatterers  with  aspect  independent  phase  shifts.  This  assumption  has  not  been  deleterious 
because  conventional  systems  generally  do  not  view  targets  over  observation  angles  that  exceed 
120  degrees.  Insensitivity  to  aspect-dependent  phase  shifts  is  obtained  via  noncoherent  summation 
of  matched  filtered  echoes  over  different  aspect  angles.  Such  noncoherent  summation  is  feasible 
with  minimum  loss  of  resolution  when  signals  are  very  broadband,  as  in  biological  sonar  systems. 

PSF  (point  spread  function)  sidelobe  suppression  and  tolerance  of  aspect-dependent  phase  shifts 
tend  to  make  BioSAS  comparatively  tolerant  to  sparse  aspect  sampling  over  large  angular 
observation  intervals.  Sparse  sampling  in  aspect  decreases  the  peak-to-sidelobe  ratio  of  the  PSF, 
and  a  method  that  suppresses  the  effect  of  such  sidelobes  leads  to  better  images  when  aspect 
samples  are  far  apart.  Large  aspect  changes  may  result  in  different  phase  shifts  from  the  same 
scattering  point  as  in  Figure  3,  and  tolerance  of  such  phase  shifts  tends  to  increase  image  quality. 

BioSAS  can  be  compared  with  conventional  techniques  by  ascertaining  the  separate  effects  of 
each  BioSAS  property,  i.e.,  (1)  PSF  sidelobe  suppression  and  (2)  insensitivity  to  aspect-dependent 
phase  shifts.  This  comparison  is  illustrated  in  Figure  4.  The  images  in  Fig.  4  are  generated  from 
echoes  obtained  from  the  Manta  mine  in  the  SPAWAR/ARL  data  set  (SPAWAR  Code  351,  San 
Diego,  CA).  The  Manta  is  viewed  over  360  degrees  at  four  degree  intervals.  An  image  generated 
by  a  conventional  SAS  processor  is  shown  in  Figure  4a.  In  Figure  4b,  phase  tolerance  is  included, 
but  sidelobe  suppression  is  lacking.  In  Figure  4c,  sidelobe  suppression  is  used  without  phase 
tolerance.  Finally,  in  Figure  4d,  both  sidelobe  suppression  and  phase  tolerance  are  used,  thus 
converting  the  conventional  SAS  into  BioSAS.  The  images  in  Figure  4  indicate  that  a  majority  of 
target  points  have  echo  phase  shifts  that  are  sensitive  to  aspect  changes.  These  target  points 
appear  on  the  phase  tolerant  images  but  are  suppressed  on  the  phase  sensitive  images. 

To  assess  the  effect  of  sparse  aspect  sampling,  the  four  processors  in  Figure  4  can  be 
compared  for  a  large  aspect  sampling  interval  of  20  degrees  (a  total  of  eighteen  “looks”  at  the 
target  over  a  360  degree  interval).  The  results  are  shown  in  Figure  5. 


Arrows  indicate 
impulse  response 
when  seen  from 
the  left  side 


Arrows  indicate 
impulse  response 
when  seen  from 
the  right  side 


Figure  3.  A  tilted  metal  plate  in  water  has  a  back-scatter  impulse  response  composed  of  a  positive 
impulse  followed  by  a  negative  one,  even  when  the  plate  rotates  180  degrees.  Cancellation  of  edge 
images  can  occur  with  phase-sensitive  processing  over  180  degrees  or  more. 
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a.  No  PSF  sidelobe  suppression, 

no  phase  tolerance  (conventional  SAS). 


b.  No  PSF  sidelobe  suppression, 
with  phase  tolerance. 


c.  With  PSF  sidelobe  suppression, 
no  phase  tolerance. 


d.  With  PSF  sidelobe  suppression, 
with  phase  tolerance  (BioSAS). 


Figure  4.  Comparison  of  SAS  images  generated  with  four  different  processors  using  a  signal  with 
100  kHz  bandwidth  and  100  kHz  center  frequency,  with  aspect  samples  that  are  4  degrees  apart, 
and  with  an  angular  observation  interval  of  360  degrees,  (a)  Fully  coherent  processing,  (b) 
Semicoherent  processing  (noncoherent  summation  over  aspect  for  tolerance  of  aspect-dependent 
phase  shifts),  (c)  Full  coherence  combined  with  nonlinear  processing  to  reduce  sidelobes  of  the 
SAS  point  spread  function  (the  range,  cross-range  ambiguity  function),  (d).  Semicoherent 
processing  for  tolerance  of  aspect-sensitive  phase  shifts  combined  with  nonlinear  PSF  sidelobe 
reduction;  biologically  inspired  SAS  processing. 
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a.  No  PSF  sidelobe  suppression, 

no  phase  tolerance  (conventional  SAS). 


b.  No  PSF  sidelobe  suppression, 
with  phase  tolerance. 


c.  With  PSF  sidelobe  suppression, 
no  phase  tolerance. 


d.  With  PSF  sidelobe  suppression, 
with  phase  tolerance  (BioSAS). 


Figure  5.  Comparison  of  SAS  images  generated  with  four  different  processors  using  a  signal  with 
100  kHz  bandwidth  and  100  kHz  center  frequency,  with  aspect  samples  that  are  20  degrees  apart, 
and  with  an  angular  observation  interval  of  360  degrees,  (a)  Fully  coherent  processing,  (b) 
Semicoherent  processing  (noncoherent  summation  over  aspect  for  tolerance  of  aspect-dependent 
phase  shifts),  (c)  Full  coherence  combined  with  nonlinear  processing  to  reduce  sidelobes  of  the 
SAS  point  spread  function  (the  range,  cross-range  ambiguity  function),  (d).  Semicoherent 
processing  for  tolerance  of  aspect-sensitive  phase  shifts  combined  with  nonlinear  PSF  sidelobe 
reduction;  biologically  inspired  SAS  processing. 
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6.  The  algebraic  reconstruction  technique  (ART)  and  top-down,  bottom-up  processing 

A  range  sample  of  a  pulse-compressed  echo  amplitude  vs.  range  plot  (A-scan)  corresponds  to 
the  projection  (the  sum  of  the  reflectivities)  of  all  the  pixels  in  a  constant-range  surface.  If  the 
sonar  moves,  the  measured  sample  value  represents  one  of  many  simultaneous  linear  equations  that 
theoretically  can  be  solved  to  obtain  the  reflectivity  distribution.  The  corresponding  matrix 
equation  can  be  solved  by  a  gradient  descent  optimization  technique.  Echo  samples  that  are 
generated  from  a  model  of  the  reflectivity  distribution  are  compared  with  actual  echo  data  samples, 
and  the  difference  is  used  iteratively  to  improve  the  model.  This  method  is  called  the  algebraic 
reconstruction  technique  (ART)  [10],  ART  is  a  special  case  of  an  iterative  sharpening  or 
deconvolution  algorithm.  Solution  of  the  matrix  equation  is  equivalent  to  applying  an  inverse 
matrix  to  the  observations,  where  the  matrix  that  is  inverted  includes  the  point  spread  function.  If 
ART  processing  can  invert  the  point  spread  function  (i.e.,  convert  the  PSF  into  an  impulse),  then 
the  ART  image  should  be  superior  to  back  projection. 

Top-down,  bottom-up  processing  is  a  cognitive  model  that  describes  the  interaction  between  an 
internal  (top-down)  representation  and  sensory  input  (bottom-up)  data  [21,60],  A  comparison 
between  predicted  and  observed  data  results  in  a  correction  or  modification  of  the  internal 
representation.  The  comparison  and  correction  can  be  implemented  as  a  gradient  descent  algorithm 
in  which  a  high  resolution,  internal  model  is  used  to  synthesize  the  input  data  that  would  be 
observed  at  a  neuronal  processing  center.  The  comparison  can  occur  in  a  relatively  low  resolution 
representation,  e.g.,  in  a  cochlear  time-frequency  representation  or  in  an  image  that  is  represented 
by  the  superior  colliculus  [22], 

The  mean-squared  error  between  the  echo  and  model  representations  is 

M 

i  x—'*  r2;r  rR  max 

MSE(A)  =  (2nMRmaxy'E(Z  J  [  [echom{r,d)-rndlm{AjM drdS)  (1) 

m-\ 

where  the  parameter  matrix /I  contains  sample  values  afJ  of  the  high-resolution  representation  and 

E{*}  denotes  an  ensemble  average  over  various  realizations  of  the  echo  and  the  model.  The  echo 
varies  because  it  is  corrupted  by  noise,  and  the  model  may  vary  because  it  can  include  stochastic 
neuronal  responses.  The  sum  over  m  represents  observations  from  M  different  aspect  angles,  as  in 
Figure  1.  The  simplest  echo  model  is  a  smeared  version  of  the  high  resolution  representation: 

mdlm(A,r,0)  =  £a..sroearJr-/;.,0-0y).  (2) 


The  smearing  function  represents  the  loss  of  angular  resolution  that  is  associated  with  a  wide 
physical  beam  width.  If  the  signal  is  a  short  duration  pulse,  the  m*  echo  of  a  given  target  point  is 
corrupted  by  a  line-like  smearing  function  that  is  orthogonal  to  the  propagation  direction,  as  in 
Figure  1. 

The  gradient  descent  technique  iteratively  solves  for  the  high  resolution  samples  a:j  via  the 
recursion 
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cMSE(A) 


for  all  values  of  k  and  / 


(3) 


akl(n  +  \)  =  ak,(n)-n 


where  n  is  the  number  of  times  the  iteration  has  been  repeated.  The  LMS  stochastic  gradient 
algorithm  uses  only  the  squared  error  rather  than  the  mean-squared  error,  yielding  an  update 
equation 


oSE(A,r,0,m) 

akl  (n  +  1)  =  akl  (n)  -  // - — -  for  all  values  of  k  and  /. 


(4) 


In  the  LMS  algorithm,  the  iteration  is  computed  at  successive  values  of  the  variables  (r,  6)  and 
observations  m=l,...,M.  The  iteration  is  sufficiently  repetitive  to  approximate  an  ensemble 
average  over  multiple  observations.  The  LMS  (or  LMS  stochastic  gradient)  algorithm  is  very 
simple  to  implement  and  has  many  engineering  applications  [23-25]. 

The  gradient  in  (4)  depends  only  upon  the  error  at  a  particular  (r,  0)  value  and  the  partial 
derivative  of  the  model  with  respect  to  au\ 

cSE(A\r,0,m) 

- =- - =  -2[echom  (r,  6)  -  mdlm  (A,r,  0)}[ctndlm  (A;r,  6)  I  &tkl].  (5) 

In  (5),  the  error  between  the  (bottom-up)  echo  and  the  (top-down)  model  is  measured  in  a  low 
resolution  or  smeared  representation.  For  the  simple  convolution  or  smearing  model  in  (2), 

dmdlm  (A;  r ,  6, m)  /  daa  ]  =  smear m (r  -  rk ,  0 -  0, ) .  (6) 


In  the  LMS  algorithm,  the  gradient  of  the  squared  error  equals  [the  error  at  (r,  6 ),  as  measured 
in  a  low  resolution  representation]  x  [the  smearing  function  for  a  relevant  area  of  the  high 
resolution  image].  If  the  smearing  function  is  broad  in  bearing  and  narrow  in  range  as  in  Figure  1, 
the  update  equation  uses  the  error  at  a  given  range  value  to  update  a  swath  of  high-resolution 
pixels  in  the  internal  model.  This  update  operation  is  the  same  as  ART.  The  high  resolution 
representation  is  updated  by  using  the  error  in  the  low  resolution  representation  along  with  the 
known  smearing  function. 

Images  that  are  obtained  from  back  projection  BioSAS  and  from  ART-BioSAS  are  shown  in 
Figure  6  for  a  Manta  mine  that  is  observed  over  a  ninety  degree  interval  with  ten  degree  increments 
between  observations  (nine  observation  points  at  1  deg,  1 1  deg, ...,  89  deg).  As  expected,  ART- 
SAS  yields  a  better  image  at  the  expense  of  the  increased  processing  time  that  is  associated  with 
multiple  iterations. 

Conventional  image  sharpening  techniques  such  as  high  pass  filtering  via  a  Laplacian  operator 
and/or  lateral  inhibition  are  prevalent  in  biological  systems  [61-63],  but  seem  to  be  unnecessary  for 
top-down,  bottom-up  gradient  descent. 
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Figure  6.  Manta  with  actuator,  high  frequency  signal,  observations  at  ten  degree  intervals 
over  90  degrees  (nine  observation  points),  no  shadow  compensation. 

Top:  Bionic  SAS  back  projection  image.  Bottom:  Bionic  SAS  ART  image. 
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7.  Biologically  inspired  techniques  for  accelerating  a  gradient  descent  algorithm  and  finding 
a  global  minimum  -  associative  gradient  descent 

The  iterative  operations  in  (4)-(6)  can  be  accelerated  via  parallel  processing.  All  of  the 
elements  of  the  .4 -matrix,  for  example,  can  be  updated  simultaneously  by  a  parallel  implementation 
of  (4)  for  all  (k,l)  values,  as  in  an  ideal  gradient  computation.  A  global  minimum  can  be  pursued 
in  spite  of  local  minima  by  using  the  simplex  method  [59]  or  a  genetic  algorithm  [26],  which  is 
similar  to  the  simplex  technique.  For  a  given  local  minimum,  an  associative  memory  can  suggest 
other  solutions  (other  A  -matrices  or  interpretations  of  the  same  data)  that  may  correspond  to  a 
smaller,  global  minimum. 

Perceptual  alternation  (e.g.,  the  Necker  cube)  and  visual  illusions  suggest  that  the  brain  uses 
prior  knowledge  and  other  information  to  speed  up  the  iteration  process  and  to  organize  sparse 
image  data  into  a  picture  that  is  commensurate  with  the  viewer’s  experience.  A  relatively  small 
number  of  observations  is  used  to  hypothesize  the  final,  high-resolution  image,  and  this  image 
hypothesis  is  introduced  into  the  top-down  data  representation  [27],  Multiple  hypotheses  can  be 
introduced  simultaneously  via  parallel  processing.  An  image  hypothesis  can  be  generated  from 
prior  expectations  and  an  associative  memory  that  is  activated  by  a  smeared  (low  resolution) 
image,  resonance  phenomena,  echo  time-frequency  distributions,  data  from  nonacoustic  sensors, 
and  other  data  that  are  not  directly  associated  with  imaging.  A  model  for  an  associative  memory  is 
a  k-nearest  neighbor  classifier  in  feature  space  [28],  or  a  neural  network  that  can  be  trained  to 
make  the  correct  association  for  various  versions  of  an  incomplete  image.  All  kinds  of  relevant 
information  can  be  inserted  into  an  ART-like  tomographic  SAS  imaging  algorithm  in  order  to 
accelerate  the  iterative  process  and  to  obtain  a  global  minimum.  This  process  might  be  called 
“associative  gradient  descent.” 

If  the  predicted  echo  incorporates  hypotheses  about  multiple  propagation  paths,  then  the 
resulting  comparator  is  part  of  a  RAKE  or  matched  field  receiver.  A  RAKE  receiver  correlates 
input  data  with  the  expected  version  of  the  data  (e.g.,  with  a  predicted  target  echo  that  is  passed 
through  a  multipath  channel)  [29].  A  “matched  field”  receiver  performs  the  same  operation  at 
multiple  receiving  sites  (the  locations  of  physical  or  synthetic  array  elements).  The  correlation 
process  can  be  implemented  as  part  of  a  mean-square  error  computation.  For  energy  normalized 
echoes  and  models,  a  receiver  that  is  equivalent  to  a  correlator  can  be  obtained  by  squaring  the 
difference  between  received  and  predicted  echo  data  echom  ( r,6)~  mdlm  (A,  r,  6)  and  integrating 

the  squared  error  over  range  and  cross-range  coordinates. 

8.  Top-down,  bottom-up  gradient  descent  and  inverse  filtering 

Top-down,  bottom-up  gradient  descent  can  be  applied  in  the  time-frequency  plane  to  yield  a 
process  that  is  equivalent  to  inverse  filtering  (deconvolution  of  echoes  with  respect  to  the 
transmitted  signal).  In  this  case,  the  ,4 -matrix,  which  represents  the  high-resolution,  top-down 
internal  model,  is  a  sampled  version  of  the  hypothesized  target  impulse  response.  The  top-down 
cochlear  representation  is  formed  by  convolving  the  hypothesized  target  impulse  response  with  the 
transmitted  signal  and  passing  the  resulting  echo  through  a  cochlear  model.  This  top-down  version 
of  the  cochlear  output  is  compared  with  the  cochlear  response  to  the  actual  echo.  The  comparison 
is  used  to  improve  the  high  resolution  internal  model  of  the  target  impulse  response  via  the  LMS 
algorithm. 
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The  cochlear  time-frequency  representation  involves  nonlinear  processing,  and  a  major 
advantage  of  the  LMS  algorithm  is  that  it  easily  handles  nonlinearities.  The  only  inconvenience 
with  respect  to  a  nonlinearity  is  that  the  partial  derivative  dmdl{A,  r,  9)  /  dakl  in  (6)  may  be 

difficult  to  evaluate  analytically,  and  may  require  empirical  evaluation  via  the  difference  equation 
r,6)l  dakl  « [mdl(A+u ;r,0)~  mdl(A, r , 0)]  /  s  (7) 


where  A*t  is  the  same  as  A  except  that  a  small  increment  f  has  been  added  to  element  kl.  For 

energy  normalized  cochlear  representations,  calculating  the  mean-square  error  is  equivalent  to 
correlating  the  cochlear  representation  of  the  echo  with  a  reference  function  that  is  based  on  a 
hypothesized  target  model  (spectrogram  correlation). 

The  relation  between  matched  and  inverse  filtering  can  be  understood  by  considering  a  filter 
that  forms  a  minimum  mean-square  error  estimate  of  the  target  impulse  response  from  an  echo  time 
series.  If  the  prior  estimate  of  the  target  transfer  function  (the  Fourier  transform  of  the  unknown 
target  impulse  response)  is  Hest  (/) ,  the  estimating  filter  has  transfer  function  [30,3 1] 


V(f)  = 


£{I*C(/)I2}^(/) 

AAr(/)  +  £{|#„,(/)|2  }\U{f)f 


(8) 


where  U(j)  is  the  Fourier  transform  of  the  transmitted  signal,  A  is  the  expected  duration  of  the 
target  impulse  response,  and  N(f)  is  the  noise  power  spectral  density.  Two  approximations  to  the 
right-hand  side  of  (8)  are 

V(f)  «  [U (/)]“’  if  signal-to-noise  ratio  (SNR)  is  large  (9) 


and 

V<J)* 


mn 


\u\f) 


if  SNR  is  small. 


(10) 


If  SNR  is  large,  then  the  target  impulse  response  (which  has  been  modeled  as  a  projection  of  the 
reflectivity  distribution  onto  the  range  axis)  is  estimated  with  an  inverse  filter,  i.e.,  a  filter  that 
deconvolves  the  transmitted  signal  from  the  echo.  If  SNR  is  small,  if  the  noise  is  white,  and  if 
there  is  no  prior  information  about  the  target  transfer  function,  then  the  target  impulse  response  is 
estimated  by  a  filter  that  is  matched  to  the  transmitted  signal,  i.e.,  a  filter  with  transfer  function 
proportional  to  £/*(/) .  The  filter  in  (8)  performs  pulse  compression  regardless  of  SNR. 


17 


9.  Top-down,  bottom-up  gradient  descent  and  pulse  compression  with  spectrograms 

Top-down,  bottom-up  gradient  descent  processing  can  be  used  to  solve  a  controversial  modeling 
problem  in  animal  echolocation:  How  can  animals  implement  a  pulse  compression  filter  with 
~lcm  range  resolution  and  an  interaural  processor  with  time  difference  acuity  of  ~7ps  when  the 
traditional  model  of  the  auditory  system  uses  a  bank  of  bandpass  filters  followed  by  envelope 
detectors  with  relatively  long  integration  times?  The  envelope  detectors  square-and-smooth  (energy 
detect)  filter  outputs  over  -260  ps  for  dolphins  [32,33],  corresponding  to  a  range  interval  of  20  cm 
in  water  (Figure  7).  For  bats,  the  integration  interval  is  approximately  -400  ps  [34], 
corresponding  to  a  range  interval  of  7  cm  in  air. 

To  solve  the  modeling  problem,  it  is  necessary  to  show  that  pulse  compression  can  occur  via 
operations  on  time-frequency  (spectrogram)  representations  such  as  the  one  in  Figure  7.  This 
possibility  can  be  demonstrated  by  considering  the  following  relation  between  spectrograms  and 
magnitude-squared  cross-ambiguity  functions  [15]: 

^ echo,  filter  signal,, filter  ({  +  *J)dtdf  \X signal  yecho  (tjf\XfiUer,filter(t  +  T,f)\2dtdf 

(11) 

where  Sechojuter(t,J)  is  the  observed  echo  spectrogram  in  Figure  7  and  “filter”  refers  to  a  baseband 
version  of  the  bandpass  filters  that  are  shown  in  the  figure.  Ssig„aifiiter(t,J)  is  the  observed 
spectrogram  of  the  transmitted  signal.  The  desired  echo  representation  for  a  semicoherent,  Doppler 
tolerant  SAS  processor  is  the  envelope  detected  signal-echo  cross-correlation  function 


■RfignalechoO)  \  \XcignalechoOt  0)  |  (12) 

which  is  known  if  | XsignaiechofaJ)  \2  is  known.  The  magnitude-squared  filter  auto-ambiguity  function 
\x filter, fiite/t,f) \2  is  a  smooth  function  of  time  and  frequency  that  presumably  is  known  to  the  animal. 


echo- 


jBandpass  filter  at  fi 


Envelope  detector  with 
*  260ps  integration  time 


Echo 

Spectrogram 

SechoJHterftiJ) 


Figure  7.  Traditional  model  of  the  peripheral  auditory  system  at  frequencies  above  five  kHz,  using 
integration  times  corresponding  to  the  critical  interval  in  dolphins. 
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Pulse  compression  can  thus  occur  if  (1 1)  can  be  solved  for  \Xngnaiecho(t.j)\2  ,  where  all  the  other 
functions  in  the  integral  equation  are  assumed  to  be  observable  or  known  a  priori. 

The  solution  \Xsignaiecho(t,j)\2  can  be  obtained  from  (1 1)  by  using  gradient  descent  to  find  the 
signal-echo  magnitude-squared  cross-ambiguity  function  that  minimizes  the  mean-squared  error 

MSE  =  J[JJ  SAflur(.t,f)S^Jlur(.t  +  r  jyltdf  -  +  *,/f  dtdtf  dr 

=  J  error 2  (13) 


The  LMS  update  equation  is 

|l  %  signal , echo  ]n+j  %  signal, echo  Wt},-rA-  „  n P 

'  %  signal , echo  V?  J  )\ 

(^,/)|2]n  +  2  M[err°r(T)]\X filter, filter  if  +  r>/)|2  > 


derror1{x) 


(14) 


iterated  over  all  t,f,  and  x  values.  This  equation  can  be  implemented  with  a  top-down,  bottom-up 
process  as  illustrated  in  Figure  8. 


Echo  spectrogram,  Sechof,iier(tJ) 


Figure  8.  Pulse  compression  via  top-down,  bottom-up  processing  of  the  echo  spectrogram. 
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A  binaural  version  of  the  processor  in  Figure  8  estimates  interaural  delay  by  substituting  the 
echo  at  the  second  ear  for  the  signal,  as  illustrated  in  Figure  9.  The  processor  correlates  the 
spectrogram  from  ear  1  with  the  spectrogram  from  ear  2  in  order  to  estimate  the  cross  correlation 
function  magnitude  corresponding  to  the  input  signals  at  the  two  ears,  before  the  signals  are 
transformed  into  spectrograms.  Spectrogram  correlation  implies  cross  correlation  of  pairs  of 
envelope  detected  bandpass  filter  outputs  from  the  two  ears  at  each  frequency  channel.  This  cross 
correlation  operation  can  be  approximated  by  coincidence  detectors  that  operate  on  the  neuronally 
coded  signals  in  two  delay  lines,  one  from  each  ear.  This  operation  corresponds  to  the  Jeffress- 
Licklider  model  for  binaural  localization  [35],  which  has  been  neurologically  verified  in  the  bam 
owl  by  Konishi  [36], 

Figure  9  suggests  that  the  Jeffress  model  may  be  incomplete  at  high  frequencies;  it  should 
perhaps  be  augmented  by  a  bottom-up,  top-down  process  that  sharpens  the  estimate  of  interaural 
delay  by  converting  the  spectrogram  cross-correlation  function  into  the  magnitude-squared  cross¬ 
ambiguity  function  of  the  two  input  signals.  Such  augmentation  applies  to  frequencies  that  are  too 
high  for  auditory  neurons  to  carry  phase  information.  Since  bats  are  sensitive  to  echo  phase  shifts 
[7,8],  phase  information  in  bat  auditory  neurons  may  deteriorate  only  at  frequencies  above  20  kHz. 


Ear  1  output  spectrogram  Ear  2  output  spectrogram 

SsigI  ,filter(l<J)  Ssig2,filter(tj) 


Figure  9.  Interaural  delay  estimation  via  top-down,  bottom-up  processing  of  echo  spectrograms. 
Interaural  delay  is  obtained  by  cross-correlating  the  input  signals  at  the  two  ears.  Cross 
correlation  of  spectrograms,  which  are  the  output  signal  representations  at  the  two  ears,  is  part  of 
the  processor,  but  top-down,  bottom-up  iteration  is  also  used. 
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At  frequencies  that  are  sufficiently  low  for  transmission  of  phase  information  along  the  auditory 
nerve,  the  spectrogram  model  in  Figure  7  can  be  used  without  the  envelope  detectors.  The 
spectrogram  without  envelope  detection  is 

s signal, fiLer  (fJ)  =  J  signal{x)  filter  {t  -  x )  exp  [jlrrf (t  -  x)]dx  (15) 

where  filter(x)  is  the  impulse  response  of  the  baseband  filter  function  that  is  used  to  form  the 
spectrogram.  Cross-correlation  of  phase-sensitive  spectrograms  from  the  two  ears  yields 

jj  ^signal}, filler  0 tj)s *  signal  2,  filter  (t  +  T,f)dtdf  =  ^  filter  ^signal \, signal 2  (r)  (16) 

where  E/tutr  is  the  energy  of  the  filter  impulse  response.  When  the  cross  correlation  in  (16)  is 
synthesized  by  first  time-correlating  the  outputs  of  each  filter  pair  at  a  given  frequency  and  then 
summing  the  results  over  filter  pairs  at  different  frequencies,  the  processor  can  be  implemented 
with  a  Jeffress  model. 

Equations  (11-16)  pertain  to  spectrogram  analysis  with  constant-bandwidth  filters.  The 
corresponding  equations  for  proportional-bandwidth  filters  (which  are  more  biomimetic)  are  given 
in  Appendix  B. 

10.  Top-down,  bottom-up  ART-SAS  processing  and  beam  deconvolution 

Dolphin  head  scanning  behavior  suggests  that  SAS  may  be  augmented  by  beam  deconvolution. 
This  deconvolution  process  can  be  included  in  an  ART-type  gradient  descent  algorithm.  A  high 
resolution  internal  model  is  convolved  or  smeared  with  the  known  beam  patterns  and  is  then 
compared  with  incoming  multi-beam  data  to  generate  corrections  to  the  model.  A  set  of 
overlapping  beam  patterns  can  be  generated  by  a  binaural  system  that  implements  multiple 
direction-of-arrival  hypotheses  in  parallel.  Head  scanning  generates  extra  independent 
observations  by  changing  the  cross-range  distribution  of  transmitted  power.  Binaural 
representations  can  be  incorporated  into  an  ART  process  by  predicting  the  data  at  each  ear. 

11.  Tracking  and  motion  compensation  with  top-down,  bottom-up  ART-SAS  processing 

Hypotheses  about  translational  and  rotational  motion  of  the  sonar  and  the  target  can  be 
incorporated  into  an  ART  processor.  The  best  motion  hypothesis  corresponds  to  the  least  error 
between  the  echo  data  and  the  prediction.  The  best  motion  hypothesis  can  be  used  to  track  the 
target,  to  predict  its  location  and  orientation  at  the  next  echo,  and  to  characterize  body  motion  in  a 
fish  or  wing  beats  in  a  bat.  This  process  is  a  form  of  image-based  tracking,  which  will  be 
discussed  in  Section  18. 

12.  Top-down,  bottom-up  gradient  descent  with  low  resolution  neuronal  maps 

In  Figures  8  and  9,  the  data  representation  at  the  comparator  is  a  smeared  (low  resolution) 
version  of  the  echo  spectrogram,  not  a  sharpened  one.  Similarly,  the  inferior  colliculus  is  used  for 
integration  of  acoustic,  visual,  and  somatosensory  data  via  spatially  registered,  low  resolution 
maps  [37],  Top-down,  bottom-up  processing  can  use  such  low  resolution  maps  to  create  a  high 
resolution  image  at  a  higher  level  of  the  brain. 
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A  basic  question  in  neurophysiology  is  how  the  locations  of  point  stimuli  (e.g.,  in  the  retina)  can 
be  inferred  from  the  collective  discharge  of  a  neuronal  population  [38],  The  use  of  a  low  resolution 
map  to  improve  a  high  resolution  internal  model  via  top-down,  bottom-up  gradient  descent 
provides  an  answer  to  this  question.  The  collective  discharge  is  a  consequence  of  divergent 
connections  from  various  sensor  elements,  and  it  is  analogous  to  a  low  resolution  map.  In  this 
case,  the  locations  of  the  point  stimuli  can  be  estimated  with  top-down,  bottom-up  stochastic 
gradient  descent,  using  the  LMS  algorithm.  To  implement  this  process,  the  estimator  must  have  a 
sufficiently  accurate  ensemble-average  model  of  the  divergent  process  (the  smearing  function)  that 
maps  a  single  point  into  a  collective  discharge.  A  payoff  for  such  divergent  stimulus  coding  is  that 
neighboring  sensors  with  overlapping  excitation  curves  contribute  to  the  representation  of  a  point 
stimulus.  The  gradient  descent  deconvolution  process  can  use  these  extra  observations  to  obtain  a 
more  accurate  stimulus  representation  than  can  be  obtained  from  a  single  sensor. 

13.  Top-down,  bottom-up  gradient  descent  and  hyperacuity 

The  gradient  descent  deconvolution  process  provides  a  mechanism  for  hyperacuity  [39], 
Hyperacuity  involves  sensitivity  to  a  small  difference  between  a  reference  stimulus  and  another 
stimulus,  e.g.,  two  parallel  line  segments  on  a  vernier  scale,  which  may  be  colinear  or  slightly 
displaced  ( — | —  vs.  — | — ).  For  sensitivity  to  small  differences,  the  mean-squared  error  in  (1) 
can  be  changed  to  mean  absolute  error: 

M 

.  _ — .  /»2 n  rR  max 

MAE(A)  =  (,2nMR„J-'E(Z  [  Jo  \echom(r,9)-mdl.(A,r,e)\drde) .  (17) 

m=  1 

In  this  case,  the  LMS  update  equation  becomes 

dAE(A,r  ,9,m) 

akl  (n  +  1)  =  akl  (n)  -  /j, - =- -  for  all  k,  l  values  (18) 


where 

dAE(A,r,0,m ) 

- =T - =  -sgn[echom(r,9)  -rndlm(A;,r,0)][Mlm(A,r,0)  /  da  J .  (19) 

a 

In  (19),  sgn(error)  equals  one  if  error>0  and  minus  one  if  error<0.  The  gradient  changes  its  sign 
but  not  its  magnitude  when  the  error  becomes  positive  rather  than  negative,  even  for  extremely 
small  absolute  error  values.  This  behavior  is  analogous  to  computing  the  difference  between  the 
responses  of  two  tuned  neurons  with  slightly  displaced  and  extremely  steep  tuning  curves,  where 
the  actual  stimulus  value  is  midway  between  the  best  stimulus  values  for  the  two  neurons.  Even 
more  sensitivity  to  small  errors  can  be  introduced  by  using  the  p*  root  (p=2,3, . . . )  of  the  absolute 
error  value  in  (17).  In  this  case,  the  algorithm  must  be  designed  to  cope  with  unbounded  values  of 
the  partial  derivatives  in  (18)  when  the  error  approaches  zero. 
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14.  Feature  images 


A  simplified,  conservative  model  for  biological  SAS  forms  an  image  by  noncoherent  summation 
of  all  echo  samples  corresponding  to  each  scattering  point  as  the  sonar  moves.  This  simplification 
allows  the  formation  of  images  that  represent  features  other  than  reflectivity  as  a  function  of 
location.  Volume  clutter  feature  images  are  insensitive  to  reflectivity,  and  represent  a  measure  of 
the  number  of  small,  Rayleigh  scatterers  in  a  pixel-sized  volume  element.  Rough/smooth  feature 
images  are  sensitive  to  the  variation  of  reflectivity  as  the  aspect  changes. 

15.  The  volume  clutter  feature  image 

A  volume  clutter  feature  image  is  obtained  by  detecting  Rayleigh  scatterers  in  an  amplitude- 
normalized  version  of  the  pulse  compressed  echo  [40].  Amplitude  normalization  eliminates 
conventional  reflectivity  as  a  target  feature.  The  volume  clutter  feature  image  is  bright  when  a 
pixel  contains  many  Rayleigh  scatterers,  e.g.,  bubbles  or  particles  that  are  small  relative  to  a 
wavelength.  If  the  small  scatterers  are  displaced  by  a  comparatively  large  target,  the  volume 
clutter  feature  image  becomes  dark.  A  comparatively  large  target  with  very  small  backscatter 
cross  section  shows  up  as  a  target-shaped  hole  in  the  volume  clutter  feature  image. 

Figure  10  (top,  left)  shows  a  SAS  reconstruction  of  a  Rockan  mine,  without  the  adaptive 
thresholding  that  is  usually  applied  to  suppress  clutter  in  the  final  image.  The  Rockan  is  a  low- 
cross  section  target  (for  back-scattered  sound)  that  resembles  a  large,  trapezoidally  shaped  clam 
shell.  The  target  is  surrounded  by  volume  clutter  consisting  of  small  bubbles  and  particulate 
matter  in  the  lake  water  where  the  measurements  were  made.  The  volume  clutter  appears  as  a 
hazy,  fog-like  image  surrounding  the  target.  The  top,  right  part  of  Figure  10  shows  a  volume 
clutter  feature  image  constructed  from  the  same  echo  data  as  in  the  top,  left  image.  In  this  image, 
the  target  is  suppressed  and  the  volume  clutter  is  accentuated.  Since  the  image  has  high  resolution, 
the  target  appears  as  a  hole  or  cavity  with  a  distinctive  shape.  The  bottom,  left  part  of  Figure  10 
shows  an  enhanced  target  image  that  is  obtained  by  multiplying  the  clutter  feature  image  by  a 
constant  and  subtracting  it  from  the  reflectivity  image.  The  bottom,  right  part  of  Figure  10  shows 
an  enhanced  clutter  image  that  is  obtained  by  multiplying  the  reflectivity  image  by  a  constant  and 
subtracting  it  from  the  clutter  feature  image. 

If  the  volume  clutter  were  to  become  more  reflective  relative  to  the  target,  the  image  at  the  top, 
left  in  Figure  10  would  become  nearly  uniform,  and  detection/classification  of  the  target  with  a 
reflectivity  image  would  become  very  difficult.  The  clutter  feature  image  on  the  top,  right  of 
Figure  10,  however,  would  become  even  better  at  revealing  the  presence  and  shape  of  the  target. 
Target  detection  with  a  clutter  feature  image  is  not  predicted  by  the  sonar  equation,  although  this 
type  of  detection  is  familiar  to  radiologists  who  work  with  medical  ultrasound.  (In  medical 
ultrasound,  organs  such  as  the  gall  bladder  and  other  structures  are  sometimes  identified  as  dark 
shapes  that  are  defined  by  the  surrounding  clutter  echoes  or  “speckle.”)  The  sonar  equation  is  a 
logarithmic  version  of  the  signal-to-interference  ratio  at  the  receiver  output,  where  the  “signal”  is 
associated  with  target  reflectivity  and  the  “interference”  is  associated  with  clutter  echoes  and  noise 
[41].  Detection  with  a  reflectivity  image  is  predicted  by  the  sonar  equation,  but  detection  with  a 
volume  clutter  feature  image  is  predicted  by  the  inverse  of  the  sonar  equation  (the  clutter-to- 
signal  ratio).  Detection  and  classification  with  a  volume  clutter  feature  image  as  in  Figure  10  may 
explain  how  dolphins  can  find  buried  fish  [42], 
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SAS  Reflectivity  Image 


Reflectivity  image  - 
(weight  x  feature  image) 


Volume  Clutter  Feature  Image 


Feature  image  - 
(weight  x  reflectivity  image) 


Figure  10.  Top:  A  SAS  reflectivity  image  and  a  volume  clutter  feature  image  of  the  same  low- 
reflectivity  target.  Bottom:  Images  constructed  from  weighted  sums  of  the  reflectivity  and  feature 
images  in  order  to  enhance  or  suppress  the  target  relative  to  the  volume  clutter.  The  images  shown 
here  were  constructed  from  echoes  obtained  over  360  degrees  of  rotation  with  3.7  degree 
increments  between  echoes. 
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16.  The  rough/smooth  feature  image 


Another  feature  image  can  represent  smooth  or  rough  reflectors.  For  smooth  and  rough  feature 
images,  the  sum  in  the  delay-and-sum  SAS  beam  former  is  replaced  by  a  quantity  that  depends 
upon  the  aspect-dependent  variation  of  the  echo  samples  that  contribute  to  the  sum.  Pixels  with 
large  aspect-dependent  amplitude  variation  are  associated  with  smooth  surfaces  that  have  large 
back-scatter  reflections  at  a  few  aspect  angles  and  weak  reflections  at  other  angles.  Pixels  with 
small  aspect-dependent  variation  are  associated  with  small,  isotropic  scattering  points  that  are 
found  on  rough  surfaces. 

Figure  1 1  shows  composite  rough/smooth  bioSAS  feature  images  for  four  different  mines: 
Rockan,  Manta,  VEMS,  and  MO-8.  The  smooth-target  feature  image  and  the  rough-target  feature 
image  compete  for  representation  of  each  pixel  in  Figure  11.  Pixel  values  in  the  composite  image 
represent  the  larger  of  the  two  feature  images  at  each  sampling  point.  Smooth  surface  pixels  are 
colored  blue  and  rough  surface  pixels  are  red.  Because  the  Rockan  is  designed  to  suppress  back- 
scattered  specular  reflections,  the  rough-surface  feature  image  dominates  the  Rockan  image, 
making  it  almost  totally  red.  The  Manta  resembles  a  truncated  upright  cone  with  a  rough  plate  (the 
actuator)  on  the  top  planar  surface.  A  SAS  reflectivity  image  shows  the  outer  shell  with  no 
indication  of  the  rough  plate;  the  interior  appears  to  be  hollow.  A  smooth-target  feature  image  also 
shows  only  the  outer  shell.  A  rough-target  feature  image,  however,  shows  only  the  rough  plate. 

The  ability  to  “see”  the  actuator  and  to  use  it  for  target  recognition  is  greatly  enhanced  by  using  the 
rough-smooth  composite  color  feature  image. 

BioSAS  feature  images,  like  more  conventional  bioSAS  reflectivity  images,  degrade  gracefully 
when  the  aspect  sampling  interval  becomes  large.  Large  aspect  sampling  intervals  correspond  to 
high  area  coverage  rate.  Figure  12  shows  a  composite  rough/smooth  feature  image  of  a  Manta 
mine,  obtained  with  an  aspect  sampling  interval  of  twenty  degrees  over  an  aspect  observation 
interval  of  360  degrees.  The  image  is  thus  constructed  with  only  eighteen  echoes. 


17.  Motion-based  feature  images 

Other  feature  images  can  be  sensitive  to  motion.  Semi-coherent  processing  of  HFM/LPM 
(hyperbolic  frequency  modulated,  linear  period  modulated)  bat-like  signals,  for  example,  can  be 
used  to  estimate  acceleration  from  frequency  shifts  [14],  and  these  acceleration  estimates  can  be 
represented  by  an  image.  Motion-based  images  may  be  valuable  for  motion  compensation  to 
obtain  a  more  accurate  SAS  image,  for  separating  a  moving  target  from  surrounding  clutter,  or  for 
discriminating  moving  clutter  from  stationary  targets  (e.g.,  seaweed  that  shifts  with  surge  in 
shallow  water). 
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Figure  1 1 .  Composite  color  images  of  four  mines  with  smooth  scatterer  image  in  blue  and 
rough  scatterer  image  in  red  (3.7  deg  aspect  sampling  intervals  over  360  deg). 

Top  left:  Rockan.  Top  right:  Manta.  Lower  left:  VEMS.  Lower  right:  MO-8. 


Figure  12.  A  composite  smooth/rough  BioSAS  feature  image  obtained  from  echoes  measured  at 
20  degree  intervals  over  a  360  degree  observation  interval  (18  different  aspect  angles).  Smooth 
surfaces  are  blue;  rough  surfaces  are  red.  The  images  are  constructed  with  echoes  from  a  Manta 
mine,  using  a  dolphin-like  signal.  The  echoes  are  from  the  SPAWAR/ARL  data  set. 
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18.  Image-based  tracking 


A  realistic  acoustic  imaging  model  for  a  biological  sonar  system  must  allow  for  freedom  of 
sensor  and  target  (prey)  motion.  One  way  to  achieve  this  goal  is  to  use  the  images  themselves  to 
compensate  for  deviations  between  actual  motion  and  predicted  motion  [43].  A  different  motion 
compensation  method  relies  on  cross-correlation  between  overlapping  echoes. 

A  theoretical  counter-example  to  cross-correlation  for  image  registration  occurs  for  convex 
targets  with  a  single  specular  reflection  at  each  aspect  angle.  For  the  target  in  Figure  1,  different 
area  elements  on  the  target  surface  have  maximum  reflectivity  as  they  become  orthogonal  to  the 
propagation  direction.  The  echo  cross-correlator  response  is  maximized  by  moving  the  effective 
center  of  rotation  so  that  the  single,  large  echo  at  each  aspect  always  corresponds  to  the  same 
image  pixel,  even  though  the  echoes  are  from  different  parts  of  the  target  surface.  A  similar 
phenomenon  has  been  encountered  at  the  Naval  Coastal  Systems  Station  (CSS)  in  Panama  City, 
FLA.  CSS  has  found  that  a  SAS  echo  cross-correlation  motion  compensation  system  breaks  down 
when  there  are  an  insufficient  number  of  small  scatterers  to  yield  an  accurate  echo  cross¬ 
correlation  measure. 

Another  counter-example  for  cross-correlation  processing  occurs  when  there  is  a  large  aspect 
difference  between  “looks.”  In  this  case,  the  echoes  from  a  random  array  of  point  scatterers  will 
become  decorrelated  because  of  the  aspect  change,  and  a  motion  compensation  method  that  relies 
on  echo  cross  correlation  will  again  deteriorate.  Large  aspect  differences  are  not  expected  for 
conventional  SAS  because  of  area  coverage  rate  constraints  (avoidance  of  spatial  undersampling), 
but  wide-band  bioSAS  can  form  images  from  observations  with  large  aspect  differences. 

Image-based  tracking  has  been  introduced  in  the  context  of  ART  processing  (Section  1 1),  but  it 
also  can  be  used  for  delay-and-sum  (back  projection)  SAS  imaging.  A  delay-and-sum  SAS 
processor  can  construct  an  image  sequentially,  by  adding  the  appropriate  sample  from  the  latest 
echo  to  a  sum  of  sample  values  from  previous  echoes  (one  sample  from  each  echo).  Each  of  these 
echo  samples  presumably  corresponds  to  the  range  of  a  specific  point  scatterer,  as  seen  from 
different  sensor  locations.  Hypothesized  motion  is  represented  by  delay  corrections  that  are 
inserted  into  the  delay-and-sum  SAS  beam  former.  For  the  most  recent  echo,  different  motion 
hypotheses  result  in  different  “test  images.”  The  best  test  image  corresponds  to  the  best  motion 
hypothesis. 

A  criterion  for  choosing  the  best  test  image  is  obtained  from  the  SAS  point  spread  function  or 
range,  cross-range  ambiguity  function  (RCRAF)  in  Section  4  (Figure  2).  The  back  projection 
SAS  image  is  the  convolution  of  the  SAS  RCRAF  with  the  actual  reflector  distribution.  In  the 
frequency  domain,  the  2D  or  3D  Fourier  transform  of  the  RCRAF  is  multiplied  by  the  2D  or  3D 
Fourier  transform  of  the  actual  reflector  distribution.  The  best  image  is  obtained  when  the  Fourier 
transform  of  the  SAS  RCRAF  is  constant,  i.e.,  when  the  SAS  RCRAF  most  resembles  an  impulse. 

If  the  most  recent  delay  in  the  SAS  delay-and-sum  process  is  incorrect,  a  line  that  should  go 
through  the  center  of  the  asterisk  in  Figure  1  is  displaced.  The  resulting  SAS  RCRAF  is  less 
impulse-like,  and  its  mean-square  bandwidth  is  reduced.  Since  the  Fourier  transform  of  the  SAS 
image  is  the  product  of  the  actual  reflectivity  distribution  with  the  Fourier  transform  of  the 
RCRAF,  the  mean-square  bandwidth  of  the  SAS  image  is  reduced  when  the  image  is  formed  with 
an  incorrect  version  of  the  most  recent  delay.  These  observations  imply  that  the  best  test  image  has 
the  largest  mean-square  bandwidth.  The  mean-square  image  bandwidth  can  be  computed  from  a 
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test  image,  without  using  a  multidimensional  Fourier  transform.  If  u(x,y)  is  a  test  image,  then  its 
mean-square  bandwidth  is 


00  oo  oo  oo 

Bu  =  J  \^du{x,y)l dc\2+\du(x,y)l ty\2\kdyl  {  \\u(x,ytf  dxdy  . 

-00-00  —00-00 


(20) 


The  bandwidth  is  a  measure  of  variation  or  image  sharpness. 

A  block  diagram  of  an  image-based  tracker  is  shown  in  Figure  13.  Delay  corrections  are 
utilized  by  a  dynamic  model  that  predicts  the  next  delay  hypotheses.  This  model  can  include  sensor 
motion  and  translation/rotation  of  various  parts  of  a  moving  target  [43],  The  image-based  tracker 
has  been  demonstrated  by  using  it  to  compensate  for  accidental  range  deviations  during  acquisition 
of  echoes  from  a  target  that  was  suspended  from  a  rotating  platform  by  thin  lines.  The  range 
deviations  are  caused  by  wobbling  of  the  suspended  target.  Image-based  tracking  has  been  tested 
further  by  deliberately  introducing  a  spurious  step  discontinuity  in  range  halfway  through  the 
imaging  process.  The  results  of  these  demonstrations  are  shown  in  Figures  14  and  15. 


19.  Motion  compensation,  dynamic  programming,  and  dynamic  models 

Motion  variability  of  a  single  point  scatterer  can  be  modeled  as  distortion  of  the  predicted  range 
vs  time  curve  of  the  scatterer.  If  the  hypothesized  delay  history  of  echoes  from  the  point  scatterer 
is  adaptively  modified  to  match  the  distortion,  the  resulting  motion  compensation  is  equivalent  to 
adaptive  focusing  with  a  multi-pulse  matched  filter. 

A  similar  distortion  problem  arises  in  speech  recognition.  A  phoneme  in  speech  data  is  often  a 
time  warped  version  of  a  reference  function  corresponding  to  the  same  phoneme.  The  problem  of 
classifying  the  distorted  phoneme  has  been  solved  by  dynamic  programming  [44],  This  method  is 
an  efficient  technique  for  sequential  implementation  of  a  likelihood  function  and  is  similar  to  the 
Viterbi  algorithm  for  decoding  communication  signals  [45,46].  The  need  for  such  efficiency  can  be 
appreciated  by  considering  the  number  of  different  trajectories  that  are  implied  by  a  simple  three 
alternative  model;  the  target  delay  (or  the  delay  of  a  phoneme  sample)  at  a  given  sampling  time  is 
the  same  as  the  predicted  delay  or  is  slightly  larger  or  smaller  than  the  predicted  value.  Figure  16 
illustrates  that  there  are  three  admissible  trajectories  (or  phoneme  time  distortions)  at  time  ti.  Nine 
different  admissible  trajectories  or  time  warps  terminate  on  the  five  nodes  at  t2,  twenty-seven  such 
paths  exist  at  t3,  and  3n  admissible  trajectories  reach  the  nodes  at  time  t„.  After  n  echoes  or  time 
samples,  correlation  of  the  delay  history  corresponding  to  each  admissible  trajectory  with  echo  data 
and  computation  of  the  maximum  correlator  output  will  result  in  a  receiver  that  is  focused  on  an 
isolated  point  scatterer,  despite  the  unpredictable  delay  perturbations.  Unfortunately,  3"  different 
reference  functions  are  needed  to  form  testable  hypotheses  for  unpredictable  perturbations  (time 
warps)  after  n  echo  time  samples. 
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Figure  13.  Sequentially  formed  test  images  indicate  how  well  delays  have  been  predicted  and  what 
corrections  should  be  applied.  This  flow  diagram  shows  adaptive  focusing  operations  for 
target/sensor  motion  compensation,  image-based  tracking,  and  SAR/SAS  imaging  of 
maneuvering  targets. 
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Figure  14.  Sequential  adaptive  focusing  of  original  echo  data  using  test  images.  Wobble 
of  the  suspended  target  introduces  a  small  range  deviations. 


(a)  Images  obtained  by  applying  conventional  and  adaptive  focusing 


Conventional  SAS  image 


Adapts  autofocus 


(b)  Adaptive  range  corrections  applied  to  original  echo  data  for  best  focus  of  target 
centroid.  Different  range  corrections  may  be  applied  to  other  parts  of  the  target. 
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Figure  15.  Sequential  adaptive  focusing  of  perturbed  echo  data  using  test  images. 
Unpredicted  sensor  or  target  motion  is  simulated  by  artificially  reducing  range  by  10  mm 
after  the  target  is  rotated  by  200  degrees. 


(a)  Images  obtained  by  applying  conventional  and  adaptive  focusing. 


Conuantional  SAS  image 


Adapts  autofocus 


(b)  Adaptive  range  corrections  applied  to  echo  data  for  best  focus  of  the  target  centroid 
when  range  is  artificially  reduced  by  10  mm  after  the  target  is  rotated  by  200  degrees. 
Different  range  corrections  may  apply  to  other  parts  of  the  target. 
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Computational  complexity  is  greatly  reduced  by  realizing  that  only  the  cumulative  maximum 
correlator  response  is  required  at  any  given  time.  Suppose  that  each  node  in  Figure  16  at  time  t„  is 
labeled  with  the  maximum  cumulative  correlator  response  for  all  paths  connecting  to  the  node.  For 
a  given  node,  this  maximum  is  the  incremental  increase  in  the  correlator  response  for  the  range 
perturbation  represented  by  the  node,  plus  the  largest  cumulative  correlator  response  for  all  nodes 
at  the  preceding  sample  time  t„.i  that  are  connected  to  the  node.  For  an  isolated  point  scatterer,  the 
maximum  cumulative  correlator  response  at  time  t„  denotes  the  best  model  for  the  perturbed 
trajectory  up  to  that  time.  Since  the  perturbations  are  admissible  distortions  in  the  hypothesized 
range  vs  time  function  for  a  moving  point  target,  the  maximum  correlator  response  is  proportional 
to  the  reflectivity  of  the  point  target. 

The  trajectory  of  a  given  target  point  is  of  interest  in  order  to  determine  what  kind  of  maneuver 
was  inferred  by  the  adaptive  focusing  algorithm  and  to  focus  on  reflecting  points  at  intevals  that 
are  shorter  than  the  overall  processing  time.  This  history  can  be  estimated  by  tracing  back  from 
the  node  with  largest  cumulative  correlator  response  at  time  t„  to  the  connected  node  with  largest 
cumulative  correlator  response  at  time  tn-i.  The  trace-back  is  continued  by  finding  the  connected 
node  with  largest  cumulative  correlator  response  at  time  tn-2,  etc. 

For  targets  with  multiple  scattering  points,  the  dynamic  programming  algorithm  should  not 
attempt  to  maximize  cumulative  correlator  response,  since  a  strongly  reflecting  point  may  be 
ephemeral  as  the  target  rotates.  A  more  reliable  performance  measure  is  the  peak-to-sidelobe  ratio 
of  the  SAS  range,  cross-range  ambiguity  function  or  point  spread  function,  which  increases  with 
the  mean-square  image  bandwidth  in  (20).  The  image  formation  process  is  sequential,  with  each 
new  transmitted  pulse  adding  another  increment  to  each  pixel.  The  image  after  n  pulses  is  the 
target  reflectivity  distribution  convolved  with  an  asterisk-shaped  point  spread  function  (RCRAF) 
with  peak-to-sidelobe  level  equal  to  n.  As  this  asterisk  becomes  more  impulse-like  with  higher  P/S, 
the  image  becomes  less  smeared,  and  the  mean  square  bandwidth  increases.  The  sharpness 
measure  in  (20)  is  thus  expected  to  increase  monotonically  with  n.  This  monotonic  increase  of  the 
performance  measure  is  required  for  dynamic  programming  [47,48].  A  dynamic  programming 
solution  to  the  delay  perturbation  problem  can  use  a  mean-square  bandwidth  measure  to  represent 
the  efficacy  of  delay  compensation,  and  trajectories  of  target  points  can  be  chosen  to  maximize  this 
image  sharpness  measure. 


Figure  16.  Three  admissible  perturbations  of  the 
predicted  range  at  each  echo  time-of  - 
arrival  yield  a  large  set  of  admissible 
range  vs.  time  functions. 
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The  size  of  a  test  image  should  be  as  large  as  possible  to  provide  a  reliable  measure  of  mean- 
square  bandwidth.  For  maximum  stability  of  the  algorithm,  the  mean-square  bandwidth  focus 
measure  should  be  computed  over  the  whole  target.  If  the  target  is  rigid  (or  behaves  as  though  it 
were  rigid  over  the  observation  interval)  the  delay  perturbation  at  any  target  point  can  be  computed 
from  the  perturbations  of  pitch,  roll,  and  yaw,  i.e.,  from  unpredictable  rotations  about  the  three 
axes  that  pass  through  the  target  center.  Instead  of  three  test  images  corresponding  to  no  change  in 
predicted  delay,  an  admissible  delay  increment,  and  an  admissible  delay  decrement,  the  algorithm 
must  now  compute  twenty  seven  test  images,  corresponding  to  all  possible  combinations  of  no 
change,  an  admissible  increment,  and  an  admissible  decrement  of  the  predicted  pitch,  roll,  and  yaw 
angles.  In  each  image,  different  target  points  have  different  delay  corrections.  If  the  target  is 
contained  within  a  cube  of  volume  elements  (voxels),  then  the  hypothesized  delay  perturbation  of 
each  target  point  or  voxel  can  be  computed  from  the  perturbations  of  pitch,  roll,  and  yaw  angles. 
Corrections  to  predicted  pitch,  roll,  and  yaw  angles  can  be  inferred  from  the  test  image  with  the 
largest  mean-square  bandwidth. 

A  motion  compensation  decision  at  each  node  in  Figure  16  can  be  obtained  by  computing 
different  test  images  corresponding  to  proposed  local  delay  corrections  or  equivalent  roll,  pitch, 
and  yaw  corrections.  The  test  image  with  the  largest  mean-square  bandwidth  corresponds  to  the 
best  local  delay  corrections,  or  the  best  proposed  pitch,  roll,  and  yaw  correction.  This  criterion  is 
based  on  the  assumption  that  the  2D  target  distribution  is  the  same  for  all  test  images.  The  only 
difference  between  the  test  images  is  the  shape  of  the  point  spread function  that  is  used  to 
estimate  them.  The  image  after  n  echoes  is  sequentially  constructed  from  the  best  test  images  for 
echoes  1,  ...,  n. 

The  interpretation  of  delay  perturbations  in  terms  of  pitch,  roll,  and  yaw  is  convenient  because 
these  angles  can  be  used  as  state  variables  in  a  dynamic  model  that  describes  target  behavior.  One 
version  of  the  state  equations  is  given  by  (21)  for  the  x,  y,  and  z  components  of  the  range  to  the 
target  centroid  and  by  (22)  for  3-D  target  rotations.  Corrections  to  the  predicted  state  vector  in 
(21)-(22)  are  obtained  from  evaluation  of  the  mean-square  bandwidth  of  test  images,  and  the 
corrected  state  vector  is  used  to  predict  the  next  observation  as  in  Kalman  filtering  [49,50],  The 
dynamic  model  improves  the  prediction  of  individual  pixel  delays  by  computing  these  delays  in  the 
context  of  overall  target  motion. 


Rx  1  0  0  At  0  0  Rx 

Ry  0  1  0  0  At  0  Ry 

R2  _  0  0  1  0  0  At  R2 

(d/dt)Rx  =  0  0  0  1  0  0  X  (d I dt)Rx 

(, d/dt)Ry  0  0  0  0  1  0  (d  /  dt)Ry 

(d  /  dt)R2  \n  |_0  0  0  0  0  1  J  [(d/dt)R2 


33 


pitch 

'i 

0 

0 

At 

0 

o' 

pitch 

roll 

0 

1 

0 

0 

At 

0 

roll 

yaw 

0 

0 

1 

0 

0 

At 

yaw 

(d  1  dt)  pitch 

0 

0 

0 

1 

0 

0 

X 

(d  /  dt)  pitch 

(d  /  dt)roll 

0 

0 

0 

0 

1 

0 

(d  /  dt)roll 

(d  /  dt)yaw 

n 

0 

0 

0 

0 

0 

1 

{d  /  dt)yaw 

20.  Image  models  for  motion  compensation 

When  dynamic  programming  is  applied  to  speech  recognition,  admissible  time  warps 
correspond  to  constraints  on  the  discrete  delay  choices  in  Figure  16.  These  admissible  time  warps 
are  chosen  in  order  to  maximize  the  correlation  of  time-warped  data  with  each  phoneme  in  a  set  of 
reference  phonemes.  A  classifier  picks  the  reference  phoneme  that  has  maximum  correlation  with 
admissibly  time-warped  data.  This  process  can  be  adapted  to  imaging  by  combining  focusing  with 
classification.  Instead  of  using  mean-square  bandwidth  to  evaluate  test  images  that  correspond  to 
different  proposed  range  and  rotation  corrections,  the  test  images  are  correlated  with  reference 
templates.  The  updated  range  and  rotation  estimates  correspond  to  the  energy  normalized  test 
image  that  has  the  largest  correlation  with  a  given  reference  image.  A  different  sequence  of 
admissible  range  and  rotation  corrections  may  be  obtained  for  different  reference  templates. 
Correlation  of  each  final,  focused  version  of  the  image  with  the  corresponding  reference  template  is 
used  to  classify  the  target  and  to  identify  the  range  and  rotation  corrections  that  best  describe  the 
motion  perturbations  that  occured  during  the  imaging  process.  This  modeling  approach  is  similar 
to  associative  gradient  descent  in  ART-SAS,  where  comparisons  between  predicted  and  observed 
echo  data  are  used  for  tracking  as  well  as  for  image  synthesis. 

21.  Image-based  tracking  and  the  2D  target  distribution  invariance  assumption 

The  utilization  of  test  image  bandwidth  to  make  a  decision  at  each  node  in  Figure  16  and  to 
choose  the  best  test  image  at  each  iteration  in  Figure  13  assumes  that  the  2D  target  distribution  in 
Figures  1  is  the  same  for  each  test  image  at  a  given  aspect  angle.  This  assumption  is  circumvented 
when  the  SAS  imaging  algorithm  is  given  the  freedom  to  make  local  adjustments  to  the  predicted 
delay  of  each  image  point.  For  a  3D  target,  parts  of  the  target  will  lie  above  and/or  below  the 
image  plane  that  is  determined  by  target  rotation  relative  to  the  sonar  in  Figure  1 .  The  freedom  to 
adjust  the  predicted  delay  for  motion  compensation  can  be  interpreted  as  the  freedom  to  move 
outside  the  image  plane  in  Figure  1  by  including  elevation  (z)  changes.  A  sharpness  measure  such 
as  mean-square  bandwidth  could  be  increased  by  moving  outside  the  designated  image  plane  as 
well  as  by  finding  the  best  delays  to  maximize  the  peak-to-sidelobe  ratio  of  the  point  spread 
function.  The  delay-corrected  image  will  have  large  mean-square  bandwidth  and  will  appear  to  be 
well  focused,  but  it  may  not  be  identical  to  the  2D  image  that  is  formed  without  delay 
corrections,  even  when  no  corrections  are  needed. 

The  monotonic  increase  of  mean-square  bandwidth  in  (20)  with  the  number  of  pulses  is  not 
affected  by  focusing  on  points  that  are  outside  the  original  image  plane,  since  such  focusing  occurs 
when  the  mean-square  bandwidth  is  larger  than  in  the  original  image  plane.  On  a  new  2D  image 
surface,  the  algorithm  can  still  place  all  the  line  segments  in  Figure  1  so  as  to  intersect  at  a  point, 
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yielding  a  sharp  point  spread  function  as  in  Figure  2.  Defocusing  via  smearing  of  the  point  spread 
function  may  be  tolerated,  however,  in  order  to  exploit  an  image  on  a  different  constant-z  plane 
that  has  particularly  large  bandwidth  when  observed  from  a  new  aspect  angle.  If  this  situation 
occurs,  then  the  estimated  delay  of  each  point,  considered  by  itself,  becomes  an  unreliable  indicator 
of  target  motion  at  the  point.  The  relation  between  the  delay  perturbation  of  a  point  and  target 
motion  at  the  point  is  known  provided  that  delay  perturbations  are  contained  in  the  plane  of  target 
rotation.  The  relation  becomes  inaccurate  when  the  estimated  delay  has  an  unknown  out-of-plane 
z-component.  This  problem  can  be  mitigated  by  using  a  3D  motion  model  as  in  (2 1)-(22)  to 
hypothesize  the  delay  of  each  point  and  to  form  the  corresponding  test  images. 

Although  very  little  motion  compensation  is  needed  in  Figure  14,  application  of  the  image- 
based  tracking  algorithm  has  a  large  effect  on  the  SAS  image.  This  effect  seems  to  be  associated 
with  the  fact  that  the  target  is  not  planar;  different  parts  of  the  target  lie  at  different  elevations. 

The  algorithm  automatically  chooses  a  delay  correction  that  corresponds  to  the  elevation  which 
maximizes  the  local  focus  criterion.  Adaptive  focusing  extracts  a  maximally  focused  two- 
dimensional  image  from  three  dimensional  target  data.  Conventional  SAS  is  restricted  to  a  single 
range-azimuth  plane  without  the  ability  to  vary  elevation  for  best  focus. 

In  Figure  15,  at  an  aspect  angle  of  200  degrees,  the  target  suddenly  “lurches”  toward  the 
sensor  via  a  simulated  delay  purturbation,  causing  an  unpredicted  range  decrease  of  1  cm  for  all 
succeeding  echoes.  Comparison  of  the  conventional  SAS  images  in  Figs.  14  and  15  indicates  that 
the  conventional  image  is  smeared  by  the  uncompensated  motion,  although  the  target  is  still 
recognizable.  As  expected,  the  adaptively  focused  images  in  Figs.  14  and  15  are  nearly  identical 
with  and  without  the  simulated  motion  perturbation  of  the  sensor.  The  delay  corrections  in  Figure 
15b,  however,  should  represent  the  ten  mm  delay  step  that  was  used  to  deliberately  degrade  the 
data  at  200  degrees,  if  delay  corrections  are  assumed  to  occur  in  the  original  image  plane. 
Comparison  of  Figures  14b  and  15b,  however,  indicates  that  only  part  of  the  delay  step  is 
accounted  for  by  delay  compensation  in  Figure  15.  This  inconsistency  is  caused  by  the  limited  step 
size  of  the  admissible  delay  correction  at  each  observation,  and  by  the  fact  that  3D  delay 
corrections  do  not  necessarily  correspond  to  delays  in  the  original  2D  image  plane. 

22.  Summary  of  adaptive  focusing  and  motion  compensation  via  image-based  tracking 

Mean-square  image  bandwidth  is  a  measure  of  the  peak-to-sidelobe  ratio  of  the  asterisk-shaped 
point  spread  function  (the  range,  cross-range  ambiguity  function)  of  a  SAS  imaging  system.  This 
measure  can  be  combined  with  a  dynamic  programming  approach  that  evaluates  test  images  in 
order  to  sequentially  correct  for  delay  perturbations.  The  corresponding  algorithm  has  been  tested 
with  wideband  sonar  data.  Improvements  and  modifications  of  the  basic  algorithm  include  (i)  a 
dynamic  model  for  prediction  of  position-dependent  delays  and  for  maneuver  estimation  from  test 
images  and  (ii)  a  SAS  analogue  to  a  time-warped  speech  classifier,  where  the  SAS  processor 
evaluates  delay  corrections  in  the  context  of  various  reference  templates.  Echo  cross-correlation  as 
in  traditional  motion  compensation  methods  can  be  included  in  the  focusing  criterion,  although 
counter-examples  indicate  that  cross-correlation  should  not  be  used  exclusively.  The  performance 
of  the  algorithm  in  multipath  and  other  adverse  conditions  has  not  yet  been  tested,  but  further 
improvements  are  possible  to  cope  with  such  conditions.  For  example,  an  animal  can  change  depth 
so  as  to  minimize  multipath,  and  it  can  roll  on  its  side  so  as  to  use  binaural  processing  as  an 
adaptive  null-steering  mechanism  for  multipath  removal. 


35 


Other  applications  of  the  method  include 


(i)  A  SAR/SAS  processor  that  can  focus  on  moving  targets; 

(ii)  Tracker  design  to  avoid  confusion  caused  by  targets  with  scintillating  multiple  highlights  or 
scattering  centers;  (An  image-based  tracker  exploits  these  highlights  to  form  a  target  image 
that  is  consistent  with  apparent  changes  in  highlight  locations.) 

(iii)  For  medical  ultrasound,  large-array  adaptive  focusing  in  nonhomogeneous  media  and  motion 
compensation  for  tracking  and  imaging  blood  flow  and  heart  valves; 

(iv)  Large-array  adaptive  focusing  in  nonhomogeneous  media  for  seismic  geophysical  prospecting 
and  imaging  of  land  mines. 

23.  Biological  feasibility  of  simplified  SAS 

A  Doppler  tolerant,  semicoherent,  tomographic  SAS  model  is  comparatively  simple  to 
implement  and  thus  appears  to  be  feasible  for  animal  sonar.  For  dolphins,  the  model  can  be 
completely  noncoherent;  echoes  from  a  given  scattering  point  are  noncoherently  summed  as  the 
animal  moves.  Massively  parallel  processing  can  be  used  to  obtain  similar  noncoherent  sums  from 
many  neighboring  points,  thus  forming  an  image.  For  bats,  a  pulse  compression  operation  is 
necessary,  e.g.,  an  operation  as  in  (8)  that  may  be  equivalent  to  matched  filtering  or  inverse 
filtering.  A  deconvolution  process  which  is  equivalent  to  inverse  filtering  can  be  implemented  by 
applying  top-down,  bottom-up  gradient  descent  to  cochlear  echo  representations,  as  shown  in 
Section  9.  For  semicoherent  SAS,  pulse  compression  is  followed  by  envelope  detection.  After 
pulse  compression,  the  bat  model  is  the  same  as  for  dolphins. 

Evidence  already  exists  for  the  capability  of  echolocating  animals  to  form  a  noncoherent  sum  of 
echoes  from  a  given  scattering  point,  although  more  experiments  are  needed.  Echo  summation  or 
integration  capability  can  be  inferred  from  dolphin  target  recognition  experiments  [51]. 

Summation  also  can  be  inferred  from  neurophysiological  experiments  on  bats  [52]  in  which  the 
excitation  threshold  of  a  neuron  is  decreased  by  repeated  stimulation  of  the  neuron.  If  the  neuron  is 
modeled  as  a  sequential  likelihood  ratio  test  for  a  particular  stimulus  in  Gaussian  noise  [53],  then 
the  prior  probability  of  the  stimulus  increases  monotonically  with  the  sum  of  preceding  stimuli. 

For  a  Bayesian  hypothesis  test,  an  increase  in  the  prior  probability  of  the  stimulus  is  associated 
with  a  decrease  of  the  excitation  threshold  [54],  Echo  summation  is  thus  encoded  as  a  decrease  in 
neuronal  threshold.  The  ability  to  concentrate  on  the  same  point  in  space  as  the  animal  moves  is 
implied  by  range  tracking  neurons  in  bats  [55,56], 

As  the  animal  moves,  the  ability  to  average  the  echo  from  each  point  seems  to  imply  that  a 
separate  neuron  is  assigned  to  each  image  pixel,  as  in  a  topographic  neuronal  map  like  the  ones 
found  in  the  superior  colliculus  [37],  The  relatively  poor  resolution  of  the  superior  colliculus  map 
can  be  improved  by  a  top-down,  bottom-up  gradient  descent  sharpening  process,  as  discussed  in 
Section  6.  This  improvement  is  not  manifested  in  the  map  itself,  but  in  an  interpretation  of  the 
map  by  a  higher  processing  center. 

An  important  counter-argument  for  the  existence  of  biological  SAS  is  that  high  resolution 
topographic  neuronal  maps  of  reflectivity  as  a  function  of  range  and  direction  have  yet  to  be 
discovered  at  higher  processing  centers  in  echolocating  animals.  There  are  several  possible 
explanations  for  this  missing  observation: 
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1 .  Insufficient  dimensionality  may  exist  because  of  amplitude  maps.  Amplitopic  representations 
have  been  found  in  the  bat  auditory  cortex.  A  neuron’s  best  amplitude  varies  monotonically 
with  its  physical  position  relative  to  other  amplitude-specific  neurons  [55],  If  an  amplitopic 
representation  is  used  for  reflectivity  in  an  acoustic  image,  then  a  map  of  reflectivity  as  a 
function  of  range,  azimuth,  and  elevation  requires  four  dimensions.  Such  a  map  cannot  be 
constructed  topographically  with  a  three  dimensional  neuronal  array.  The  brain  is  forced  to 
use  either  a  non-topographic  representation  or  multiple  maps  that  are  different  projections  of  a 
higher  dimensional  representation.  Range,  azimuth,  and  elevation,  for  example,  may  be  coded 
non-topographically  as  additional  constraints  to  neuronal  excitations  within  a  population  of 
amplitopically  organized  neurons. 

2.  An  advantage  of  a  topographic  map  is  that  stimulus  representations  can  be  sharpened  by 
localized  lateral  inhibition.  Range/angle  sharpening,  however,  also  can  be  implemented  via 
top-down,  bottom-up  gradient  descent.  Topographic  maps  are  thus  convenient  but  not 
necessary  for  obtaining  better  resolution. 

3.  Images  may  be  dynamically  coded,  such  that  a  neuronal  topographic  map  represents  changes 
in  an  acoustic  image  rather  than  the  image  itself.  Pulse-to-pulse  jitter  in  the  range  and  cross¬ 
range  coordinates  of  a  point  scatterer,  for  example,  may  be  represented  topographically,  while 
a  point  that  does  not  move  or  scintillate  between  transmissions  may  not  be  represented. 
Range-rate  and  rate  of  angular  change  may  be  represented  by  ordered  neuronal  maps. 

4.  A  relevant  map  may  be  associated  with  (or  projected  onto)  a  different  sensory  modality  such  as 
vision  or  somatosensation.  This  type  of  projection  or  association  is  suggested  by  facial 
sensations  that  are  experienced  by  blind  people  [57],  In  this  case,  a  visual  or  somatosensory 
map  with  comparatively  high  resolution  is  the  top-down  part  of  a  top-down,  bottom-up 
gradient  descent  sharpening  process.  Comparisons  between  predictions  and  data  are  made  in  a 
low  resolution  representation  with  registered  spatial  maps  from  different  sensors,  such  as  the 
superior  colliculus. 


24.  Coherent  SAS 

At  present,  there  is  insufficient  evidence  to  conclude  that  bats  or  dolphins  can  perform  coherent 
pulse-to-pulse  integration,  which  is  necessary  for  conventional,  coherent  SAS.  Nevertheless,  there 
is  some  motivation  to  consider  coherent  SAS.  Bats  are  sensitive  to  a  constant,  frequency- 
independent  echo  phase  shift  [7,8],  which  is  necessary  but  not  sufficient  for  coherent  SAS. 
Coherence  implies  that  the  SAS  range,  cross-range  ambiguity  function  (RCRAF)  is  a  coherent  sum 
of  rotated  versions  of  the  physical  RCRAF  of  the  sonar.  In  this  case,  the  sonar  RCRAF  can  be 
designed  such  that  the  SAS  RCRAF  has  very  high  peak-to-sidelobe  ratio,  even  with  large  angle 
increments  (a  small  number  of  aspect  angles  and  echoes).  The  required  physical  RCRAF 
corresponds  to  a  particular  set  of  signals  that  must  be  measured  at  various  positions  around  the 
sonar  transmitter.  These  signals  resemble  waveforms  that  have  been  measured  around  an 
echolocating  dolphin  [18,58], 
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25.  Summary  and  Conclusion 

A  biological  version  of  synthetic  aperture  imaging  appears  to  be  feasible,  and  it  is  advantageous 
for  detection  and  classification  of  prey  (e.g.,  finding  buried  fish).  The  simplifications  and 
generalizations  that  are  used  to  obtain  a  biological  model  lead  to  new  insights  and  capabilities  for 
man-made  SAS  systems.  These  capabilities  include  generalized  trackers  for  range-extended 
targets,  new  kinds  of  acoustic  images  that  represent  various  target  features,  the  capability  to  obtain 
acoustic  images  with  observations  from  relatively  few  aspects,  and  an  associative  gradient  descent 
model  that  utilizes  prior  information  or  hypotheses  for  fast  convergence  to  a  global  minimum. 
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APPENDIX  A:  Back  Projection  and  Synthetic  Aperture  Processing 


In  order  to  compare  back  projection  with  synthetic  aperture  imaging,  it  is  helpful  to  review 
some  properties  of  two-dimensional  Fourier  transforms.  The  first  property  is  the  expression  for  the 
2D  Fourier  transform  in  cylindrical  coordinates.  In  Cartesian  coordinates,  the  2D  Fourier 
transform  is 


oo  oo 

f(x,y)  =  (2tt)-2 j  | F(cox , <ay)ex$\j{(oxx -HoJjyicoxd(Oy. 

-oo  -oo 


(Al) 


In  cylindrical  (r,0)  frequency  domain  coordinates,  c ox-r  cos#  ,coy  =  r  sin  0,  and 


F(o)x  ,coy)  =  F{r  cos  8,  r  sin  8)  =  F^  (r,  8).  (A2) 

Cylindrical  (p,<J>)  coordinates  in  the  spatial  domain  are  such  that  x  =  pcostf),  y  =  ps\n(j) ,  and 

f(x,y )  =  f(pcos<t>,ps\n<f>)  m  (A3) 

Substituting  (A2)  and  (A3)  into  (Al),  changing  variables ,  and  noting  that  the  Jacobian  of  the  joint 
change  of  variables  (Ox  =  r  cos  8  ,(0y  =  r  sin  8  is 


daxlcF  dcoyl<F 
dcox  jdd  dcOyjdd 


cosd  sin# 
-rsin#  /* cos# 


(A4) 


yields 


It!  2  oo 

/clip,#)  =  (2/r)”2  J  | F^ir, 8)exp[jrp(cos8cos<p  +  s\n8 sin <f>)\r\irdd.  (A5) 

-kI7  -oo 


The  desired  expression  for  the  2D  Fourier  transform  in  cylindrical  coordinates  is  obtained  by  using 
the  identity 

cos#  cos  ^  +  sin  #sin  (f>  =  cos  (#  -  <f>)  (A6) 

in  (A5),  which  results  in 

7T/2  oo 

fcyi  iP>  =  (2xY2  J  j  F^,  (r ,  #)  cxp[ jrp  cos(#  -  0)]\r\3rdd.  (A7) 

-x/2  -oo 


Rotation  of  an  image  in  the  x,y  plane  corresponds  to  a  similar  rotation  of  the  2D  Fourier 
transform  of  the  image  in  the  <ax,cOy  plane.  This  property  follows  easily  from  (A7).  Rotation  in  the 
x,y  plane  by  y  radians  transforms  fcyi(p,4>)  to  fry](p,<t>+y).  Replacing  <|>  by  <{>+y  on  the  right  hand  side 
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of  (A7)  and  changing  variables  by  letting  0,=9-y,  the  right  side  becomes  the  2D  Fourier  transform 
of  Fcyi(r,0+y).  It  follows  that 


MP^+T)  <->Fcyi(r,0+y)  (A8) 


fi(xcosy-ysiny,ycosy+xsiny)  <->F  (OxCosy-cOySiny^yCosy+OxSiny)  (A9) 

where  the  double  arrow  indicates  a  2D  Fourier  transform  pair. 

Another  property  of  2D  Fourier  transforms  is  that  the  projection  of  fi(x,y)  onto  the  x-axis  is  the 
ID  inverse  Fourier  transform  of  F(cox,0).  The  projection  of  f(x,y)  onto  the  x-axis  is 

00 

P^)=  \f{x,y)dy.  (A10) 

-00 

Integrating 

oo  oo 

f(x,y)  =  (2/r)~2 J  J F(cox,o)y)^\j(<x>xMo/)Yo)do)y 

-00  -oo 

with  respect  to  y  yields 

oo  oo  oo 

F0w  =  \f(x,y)(fy  =  (2 *)“'  j  J F((ox , Q)y ) exp(j6)xx)S(cDy )do)xdo)y 

-00  -oo  -00 


=  (2;r)-1  j F(6)x,0)exp(Jaxx)dcox 


which  is  the  ID  inverse  Fourier  transform  of  F(cox,0).  It  follows  that 

oo 

j  P0(x)exp(-jo)xx)dx  =  F(o)x,  0). 


(All) 


(A12) 


The  above  projection  property  can  be  generalized  to  rotated  versions  of  f(x,y).  Rotating  f(x,y) 
by  0  radians  yields 

fg(x,y)s  f(xcos0-ys\n&,ycosO+xsind).  (A13) 
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It  follows  from  (A9)  that  the  2D  Fourier  transform  of  fe(x,y)  is  a  similarly  rotated  version  of 
F(©x,o)y): 

fg(x,y )  <->  Fe(<°X’coy)  =  F(a>x  cos 0-coy  sin G,coy  cos0  +  <yx  sin 9).  (A14) 

Letting  Pe(x)  denote  the  projection  of  fe(x,y)  onto  the  x-axis,  i.e.. 


P.U)=  (Ais) 


(A10)-(A12)  imply  that  the  ID  Fourier  transform  of  Pe(x)  is  Fe(cox,0).  Using  r  instead  of  cox, 


J  Pg  (x)  exp(-jrx)dx  =  Fg  (r,0)  =  F(r  cos  6,r  sin  0)  =  F^,  (r,  6).  (A16) 


The  projection  of  a  rotated  version  of  the  image  f(x,y)  can  be  Fourier  transformed  in  one 
dimension  to  obtain  the  2D  Fourier  transform  of  the  image  in  cylindrical  coordinates,  evaluated 
along  a  constant-0  slice  in  the  frequency  domain.  This  result  is  known  as  the  projection-slice 
theorem  [9],  It  implies  that  a  sequence  of  projections  of  incrementally  rotated  images  can  be  used 
to  obtain  a  sequence  of  constant-0  slices  of  the  2D  Fourier  transform  of  the  image  in  cylindrical 
coordinates.  The  image  can  be  reconstructed  from  its  projections  by  computing  an  inverse  2D 
Fourier  transform  in  cylindrical  coordinates,  as  in  (A7).  This  form  of  reconstruction  is  known  as 
back  projection. 

To  obtain  a  more  explicit  expression  for  the  reconstructed  image  in  terms  of  its  projections, 

(A  16)  is  solved  for  Pe(x)  by  taking  the  inverse  ID  Fourier  transform  of  both  sides  of  the  equation. 


/>»(*)  =  (2*r’j  F^(,r,ff)exp(jrx)dr.  (A17) 


A  gradual  high  pass  filter  (similar  to  differentiation  without  the  corresponding  phase  shift)  can  be 
applied  to  the  projection,  yielding 


Pg,Hp  (*)  =  (2<T’  J  Fvl  (r,  0)  exp(jrx)\r\dr .  (A18) 


The  integral  on  the  right  hand  side  of  (A  18)  is  contained  in  the  2D  inverse  Fourier  transform 
expression  (A7)  in  the  form 


(2x)~]  f  F9,,(r,0)exp[jrpcos(0-0)]lrjdr  =  Pe,HP[pcos(0- <f>)].  (A19) 


Substituting  (A  19)  into  (A7)  yields 
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(A20) 


n 72 

f<yiM)  =  (2*)"1  j ^>ffi>[pcos(0-^)]rf0. 


-jr/2 


The  original  image  £yi(p  ,<|) )  can  be  reconstructed  by  summing  high-pass  filtered  projections.  To 
obtain  the  image  f(x,y)  in  Cartesian  coordinates,  recall  from  (A3)  that 

fcyi(pJ)  =  f  (p  cost,  p  sin  t)  = 

and  from  (A6)  that 

cos(#  -  <j>)  =  cos  0 cos  <j>  +  sin  6  sin  <f>. . 

It  follows  that  (A20)  can  be  written 

It!  2 

/  {p  cos  <f>,  p  sin  <f>)  =  (2tt)_1  J  P6hp  [{p  cost)  cos  @  +  (P  sin  t)  sin  9\d6 


f(x,y)  =  (2n)  1  j PeHP(x  cosO + y  s\n6)d6 . 

-nil 


(A21) 

(A22) 


Back  projection  algorithms  typically  utilize  the  projection-slice  theorem  to  obtain  the  2D  Fourier 
transform  of  the  image  in  cylindrical  coordinates.  The  image  is  then  reconstructed  via  a  2D  inverse 
Fourier  transform  operation.  The  equivalent  expression  in  (A22),  however,  is  useful  for 
illustrating  the  similarity  between  back  projection  and  synthetic  aperture  processing. 

For  radar/sonar/ultrasound  processing,  suppose  that  a  transducer  is  placed  on  the  negative  x- 
axis.  A  target  is  rotated  about  the  origin  of  the  coordinate  system,  and  the  range  is  defined  to  be 
zero  at  the  center  of  rotation.  Projections  of  the  target  are  obtained  by  rotating  the  target  clockwise 
and  recording  reflectivity  vs  range  (A-scan)  data  at  each  rotation  after  filtering  to  obtain  an 
estimate  of  the  target  impulse  response,  e.g.,  matched  filtering.  The  integration  surfaces  for  each 
projection  correspond  to  points  with  constant  delay,  e.g.,  spherical  shells.  The  thickness  of  the 
integration  surfaces  or  shells  are  determined  by  the  range  resolution  cell  of  the  system  ,  i.e.,  by  the 
system  bandwidth. 

To  track  a  point  on  the  target  with  initial  position  (x,y),  the  matched  filtered  echo  (A-scan)  from 
the  target  is  evaluated  at  range  xcos9+ysin0  as  the  target  is  rotated.  Equation  (A22)  describes  a 
sum  of  high  pass,  matched  filtered  echoes  from  the  point  on  the  target  at  initial  position  (x,y)  as  the 
target  is  rotated. 

The  same  A-scan  data  can  be  obtained  by  moving  the  transducer  in  a  circle  around  the  target,  or 
by  using  a  large  array  of  transducers  that  are  arranged  in  a  circle  with  the  target  at  the  center.  The 
second  alternative  is  an  actual  array,  while  the  first  is  a  synthetic  array.  The  array  is  focused  on  a 
target  point  by  delay-and-sum  beam  forming.  Consider  a  transducer  that  is  located  on  a  circle,  0 
radians  counterclockwise  relative  to  the  negative  x-axis.  A  signal  is  transmitted  toward  the  target 
from  this  transducer,  and  the  resulting  echo  is  received  by  the  same  transducer  and  matched  filtered 
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or  otherwise  processed  to  estimate  target  impulse  response.  The  contribution  of  this  filtered 
transducer  output  to  the  beam  former  image  of  the  target  point  at  x,y  is  a  sample  of  the  matched 
filtered  echo.  This  sample  is  chosen  to  correspond  to  the  range  of  the  target  point,  i.e.,  to  a  range 
of  xcos0+ysin0  when  range  zero  is  at  the  center  of  the  circle  .  The  delay-and-sum  beam  former 
for  the  real  or  synthetic  array  approximates  the  integral  in  (A22)  by  a  finite  sum  over  a  sequence  of 
aspect  (0)  values.  Such  a  finite  sum  approximation  is  also  used  in  back  projection.  If  the  matched 
filtered  echoes  are  high  pass  filtered  by  using  a  filter  with  transfer  function  |co|,  delay-and-sum 
synthetic  aperture  imagine  and  back  projection  are  equivalent  processes. 

One  way  to  exploit  the  equivalence  of  SAS  and  BP  is  to  form  a  3D  image  when  the  transducer  is 
above  the  plane  of  rotation.  If  the  transducer  is  located  above  the  negative  x-axis  such  that  the  line 
between  the  transducer  and  the  origin  forms  an  angle  a  relative  to  the  negative  x-axis,  and  if  the 
target  is  in  the  far  field  of  the  transducer,  then  (A22)  becomes 

nil 

f(x,y,z)  =  (27r)~]  ^PgHP[{xcosO  +  ys\n6)cosa-zsina\iO .  (A23) 

-it!  2 


45 


Appendix  B:  Cross  correlation  of  proportional  bandwidth  spectrograms  for  wide-band 
signal  processing 


Proportional  bandwidth  spectrogram  analysis  is  closely  related  to  wavelet  analysis.  Two 
signals  u,(t)  and  u2(t)  may  represent  a  received  signal  and  a  reference  function  or  the  input  signals 
at  the  two  ears.  In  either  case,  the  first  goal  is  to  cross  correlate  the  two  signals  when  the  signals 
are  represented  by  phase-sensitive  outputs  of  two  proportional  bandwidth  spectrogram  analyzers. 

As  in  the  narrow-band,  constant-bandwidth  case,  the  signal  cross-correlation  function  can  be 
obtained  by  cross-correlating  the  outputs  of  corresponding  spectrogram  filters  in  time  and  then 
summing  the  resulting  cross-correlation  functions  over  all  the  filters.  The  phase-sensitive 
spectrograms  that  are  generated  by  proportional  bandwidth  filters  are 


sn(t,a )  =  a~m  j U„(co)V(o)/cc)eio*d6)  ,  n=l,2.  (Bl) 

-co 

Cross  correlating  the  two  spectrograms  in  time  at  each  a-value  (cross  correlating  corresponding 
filter  outputs)  and  summing  over  all  the  filters  yields 

oo  oo  oo  co 

Jda  fdt[Sj(t,  a)s*2(f  +  t,  a))  =  \UX  (a))U'2(co)[\  a"1  \V(eo  /  af  da]e~j,aTdo)  (B2) 

0  -oo  —oo  0 

where  a  change  of  variables  in  the  a-integral  yields 

]a~l\V{(ola)\2da  =  }^^dx  =  Cv  (B3) 

0  0  * 

as  in  wavelet  analysis  [64] .  It  follows  that 


\da\  dt[sx  (t,  a)s*2  (t  +  r,  a )]  =  CvRUi„2  ( r) 


0  -oo 

where 


(B4) 


RulUl  («■)=/#,  (0»2  (t  -  T)dt  (B5) 


is  the  desired  phase-sensitive  cross-correlation  function  between  the  two  input  signals. 
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Another  version  of  proportional  bandwidth  spectrogram  analysis  utilizes  spectral  distortion 
defined  by 

UE{o)  =  U{e°>).  (B6) 

Since  exp(co)  varies  between  zero  and  infinity,  it  is  assumed  that  U(a>)  is  analytic,  and  thus  has 
support  only  for  nonnegative  frequencies.  The  mapping  in  (B6)  automatically  performs  pulse 
compression  on  signals  that  have  a  logarithmic  frequency  domain  phase  function  expf-jklog(<o)J. 
The  corresponding  time  signals  before  pulse  compression  have  linear  period  modulation  and 
hyperbolic  frequency  modulation,  and  they  resemble  the  echolocation  signals  of  many  bats. 

The  phase-sensitive  spectrogram  for  signal  and  filter  functions  that  are  spectrally  distorted  as  in 
(B6)  is  [65] 


sURyt  0,loga)  =  J U E (\og,co)VE (log co  -  \oga)eJt]ogold(\ogO})  (B7) 


where 

UE  (log  cd)  =  U  (a)  (B8) 

and 

(logo -log  a)  =  V(co/a).  (B9) 

Cross  correlation  of  two  such  time-frequency  representations  yields 

oo  oo 

\d  log  a  {  dt[sUnVE  ( t ,  log  a)s'UcyE  (t  +  z,  log  a)] 

-oo  -oo 

oo  oo 

=  j  Ua  0cg«y)f40og«X  jl^OQg<y-lcga)|2  d\oga]e~JTlcgtad\ogco 

-oo  —oo 

=  EvRUtpti{z)  (BIO) 

where  Ev  is  the  filter  energy  and  RUeiuE2  (t)  the  cross  correlation  function  of  the  spectrally 
distorted  input  signals. 

The  phase-insensitive,  squared-envelope  spectrogram  is  obtained  by  envelope  detecting  the 
outputs  of  the  filter  bank; 

oo 

SUtys(t,\oga)  =| \uE(\oga)VE{\ogo) - loga)e;,1°8 V(log«)|2  . 


(B 1 1) 


Solving  (B12)  for  \%v  v  (t,oo) |2  via  the  iterative  procedure  in  Figure  8  yields 


(^O)!2  (7)|2 . 


(B14) 


the  squared  envelope  of  the  cross-correlation  function  of  the  spectrally  distorted  input  signals.  As 
in  the  constant-bandwidth  case,  cross  correlation  of  proportional  bandwidth  spectrograms  can  be 
used  for  pulse  compression. 
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